Dear Diary, Today I learned to stop clicking the “load more” button if new content stops appearing…
Today, agents are enthusiastic but dumb. They go out and do the same task in a potentially different way each time out, having no memory of what worked – or didn’t work – the time before.
Not surprisingly, there have been efforts to have agents learn from their past actions, either by capturing all the actions or those workflows that have worked successfully. A diary for agents, so to speak. But these approaches just capture actions; they don’t distill any higher-level learnings from the agents’ trajectories.
Now, Google researchers have created a memory framework for agents, called ReasoningBank, where successful and failed actions can be stored and later retrieved for assistance in new tasks.
“Unlike existing workflow memory strategies that only focus on successful runs, ReasoningBank actively analyzes failed experiences to source counterfactual signals and pitfalls,” wrote Google Cloud Research Scientists Jun Yan and Chen-Yu Lee, in a blog post describing their work.
“By distilling these mistakes into preventative lessons, ReasoningBank builds powerful strategic guardrails.”
A specialized memory workflow, in a continuous closed loop, retrieves, extracts and consolidates strategic operational lessons. Before an agent takes action, it checks ReasoningBank for any relevant lessons for its context.
And unlike most other approaches for creating autodidactic agents, ReasoningBank gleans insights not only from successful lessons but also from failed ones.
Learn From Success, but Also From Failure
To tackle a problem, a typical agent will typically be given a computational budget to work through all the possible paths to a solution, an approach called test-time scaling or TTS. This approach only records the final answer. It does not save the steps it took to reach the answer.
To capture performance data, the researchers created a new approach, memory-aware test-time scaling (MaTTS). With MaTTS, an agent is given the freedom to execute multiple tasks in parallel, or to linearly elongate the reasoning process.
As the agent goes about its work, it evaluates each trajectory through the LLM-as-a-Judge framework. The judge provides insights around both successful and failed tasks, which are entered into the ReasoningBank for future reference.
For ReasoningBank, failure provides the guardrails. For the example at the top of this post, the agent learns not to keep clicking the “load more” button if there are no new results, i.e. “always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results,” the researchers write.
The ReasoningBank writes the output from the LLM-as-a-Judge into a JSON-structured metadata that is easily ingestible by the agent. The “memory” is structured thusly (in the authors’ words):
- Title: A concise identifier summarizing the core strategy.
- Description: A brief summary of the memory item.
- Content: The distilled reasoning steps, decision rationales, or operational insights extracted from past experiences.
When an agent undertakes a new task, it scans the ReasoningBank for the most relevant entries, via a vector search, and incorporates them into its own context.
Smarter Than the Average Bot
ReasoningBank agents outperformed their memory-free counterparts on two complex learning environment benchmarks, by 8.3% on WebArena, and 4.6% on SWE-Bench-Verified.
ReasoningBank also appeared to reduce the number of steps that an agent may take answering queries. On SWE-Bench-Verified results, ReasoningBank took 2.8 less execution steps per task than memory-free baselines.
Of course, deploying ReasoningBank would incur additional overhead in memory extraction, context injection and running the LLM-as-a-Judge, though the researchers argue that successful results will result in faster processing and fewer executable steps.
The researchers presented their work at the International Conference on Learning Representations last month. They also posted demonstration code on GitHub.

