ICML 2026

AdaMEM: Test-Time Adaptive Memory for Language Agents

Yunxiang Zhang1, Yiheng Li1, Ali Payani2, Lu Wang1
1University of Michigan    2Cisco Research
Correspondence: yunxiang@umich.edu
arXiv Code Paper
AdaMEM overview diagram

Overview of AdaMEM. Instead of relying on a static, episode-level strategy, the agent adapts to the current decision step by querying a long-term trajectory memory of raw experiences (1–2) and synthesizing them into a dynamic short-term strategy memory tailored to the current state (3). Conditioned on this test-time strategy, the agent adapts its next action (4) without requiring parameter updates.

Abstract

A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initiation. Consequently, agents are forced to rely on static guidance that becomes increasingly misaligned as long-horizon tasks unfold. To address this rigidity, we propose the Adaptive Memory Agent (AdaMEM), a novel framework for agent test-time adaptation. Without updating model parameters online, AdaMEM adapts agent behavior via a hybrid memory architecture: it maintains a long-term trajectory memory of raw experiences collected offline while generating dynamic short-term strategy memory on-the-fly to guide decision-making. This mechanism enables the trade-off between token efficiency and adaptability across varying inference-time compute levels. Empirically, AdaMEM significantly outperforms static memory baselines, achieving relative gains of up to 13% on ALFWorld and 11% on WebShop, with consistent leading performance extending to agentic search on HotpotQA. To further enhance this adaptation, we develop STEP-MFT, a Step-wise Memory Fine-Tuning technique that trains the policy to synthesize high-quality strategies from retrieved experiences, yielding additional performance gains. Our work establishes a new scaling dimension for agentic memory, supporting continuous reasoning and self-evolution post-deployment in real-world environments.

Method

AdaMEM: Test-Time Adaptive Memory (Training-Free)

Prior work such as Synapse and ReasoningBank restricts memory retrieval to episode initiation, forcing agents to follow a fixed plan throughout the task. AdaMEM overcomes this by maintaining a scalable long-term trajectory memory of successful past experiences and dynamically synthesizing a concise short-term strategy memory at each decision step. Two inference modes trade off cost and adaptability: AdaMEM-HIGH generates a fresh transient strategy whenever the agent requests memory, while AdaMEM-LOW persists a strategy and lets the agent decide when to refresh it.

Comparison of memory mechanisms

Comparison of test-time agent memory mechanisms. ReAct operates without external memory. Synapse and ReasoningBank employ static initialization, retrieving a trajectory or strategy only at the episode start (S0). In contrast, AdaMEM enables test-time adaptation via dynamic memory retrieval and synthesis. AdaMEM-LOW balances efficiency by maintaining a persistent strategy and triggering a refresh only when necessary. AdaMEM-HIGH maximizes adaptability by regenerating a fresh strategy at every critical decision step.

STEP-MFT: Step-wise Memory Fine-Tuning

Although AdaMEM works well zero-shot, simple prompting can yield over-general strategies. STEP-MFT trains the policy to generate high-utility strategies using a process-level signal: a strategy is kept for training only if it changed the agent's next action on a successful trajectory. This dual filter (outcome success + action change) efficiently assigns credit at the step level without requiring rollouts or a specialized critic.

STEP-MFT framework

STEP-MFT framework. We employ dual-filter rejection sampling to curate high-utility strategies for supervised fine-tuning. The process retains only successful trajectories where the strategy actually changes the proposed action (At ≠ A′t, green), while discarding redundant instances where the memory-free baseline yields the same action (At = A′t, blue).

Results

Performance comparison under training-free setups (mean ± std over 3 runs). Bold = highest, underline = second highest.

Memory Mechanism Long-Term Short-Term ALFWorld WebShop
Seen Unseen
On-Policy Long-Term Memory (Qwen3-4B-Instruct / Qwen2.5-7B-Instruct-RL)
No Memory 45.2 ± 1.846.8 ± 2.571.4 ± 1.4
ReasoningBankStrat.Strat. 49.3 ± 0.751.2 ± 0.968.6 ± 2.0
SynapseTraj.Traj. 52.1 ± 1.952.2 ± 0.765.4 ± 0.9
AdaMEM (ours)Traj.Strat. 54.0 ± 2.958.2 ± 3.974.2 ± 0.3
Off-Policy Long-Term Memory (Gemma-3-27b-it)
No Memory 36.7 ± 0.837.6 ± 1.918.2 ± 0.1
ReasoningBankStrat.Strat. 45.2 ± 2.244.8 ± 1.318.6 ± 0.5
SynapseTraj.Traj. 35.0 ± 2.130.6 ± 0.722.6 ± 0.5
AdaMEM (ours)Traj.Strat. 47.4 ± 2.249.5 ± 2.624.7 ± 1.2
Results bar chart

Impact of Memory Fine-Tuning. STEP-MFT (step-level filter) consistently outperforms outcome-based MFT and training-free baselines on both ALFWorld and WebShop.

Pareto frontier

Effectiveness vs. efficiency trade-off. AdaMEM establishes a superior Pareto frontier between task performance and average inference tokens per step on both ALFWorld and WebShop.

Scalability with k

Scalability with retrieval budget k. AdaMEM scales monotonically with more retrieved experiences, while Synapse degrades as injecting additional raw trajectories leads to context overflow.

Case Study

AdaMEM recovers from stale strategies mid-episode. When an initial strategy directs the agent to the wrong location, AdaMEM detects the mismatch, explicitly reasons that the "strategy is outdated," and triggers a refresh — recovering where static agents loop or hallucinate.

Task: put two soapbar in garbagecan
Adaptive Memory
Action
- Use inventory to confirm if any soapbar is already being carried.
- Systematically visit each cabinet (1 to 4), opening each if closed.
- Once a soapbar is found, take it and move it to garbagecan 1.
inventory
go to cabinet 1
open cabinet 1
go to cabinet 2
open cabinet 2
go to cabinet 3
open cabinet 3
go to cabinet 4
open cabinet 4
Agent Thought: So far, no soapbar has been found. Therefore, the existing strategy is outdated because it assumes all soapbars are in cabinets, which is not confirmed. Therefore, a refresh is necessary.
<refresh_decision>yes</refresh_decision>
- Go to countertop 1 to check for soapbar(s), as they are commonly found there.
- If not found, go to sinkbasin 1 or sinkbasin 2.
- Only after locating and taking a soapbar proceed to move it to the garbagecan; repeat for a second soapbar.
go to countertop 1
take soapbar 1 from countertop 1
go to sinkbasin 1
go to countertop 1
take soapbar 2 from countertop 1
Agent Thought: Refresh necessary? Yes, because: - The current strategy assumes that "take" is available, but it is not. - The environment does not allow taking soapbar 2 because the action is not in the admissible list.
<refresh_decision>yes</refresh_decision>
- Go to garbagecan 1 and move soapbar 1 to it to begin fulfilling the goal.
- After placing one soapbar, return to countertop to retrieve the second soapbar.
- Confirm that both soapbars are in the garbagecan before concluding the task.
go to garbagecan 1
move soapbar 1 to garbagecan 1
go to countertop 1
take soapbar 2 from countertop 1
move soapbar 2 to garbagecan 1
✓ Success

BibTeX

@inproceedings{zhang2026adamem, title = {{AdaMEM}: Test-Time Adaptive Memory for Language Agents}, author = {Zhang, Yunxiang and Li, Yiheng and Payani, Ali and Wang, Lu}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, series = {Proceedings of Machine Learning Research}, year = {2026}, }