G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

BibTex

Copy

@misc{yan2025gmemorytracinghierarchical,
      title={G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems},
      author={Shuicheng Yan and Kun Wang and Guibin Zhang and Guancheng Wan and Miao Yu and Muxin Fu},
      year={2025},
      eprint={2506.07398},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2506.07398},
}

GitHub

GMemory

HTTPS

https://github.com/bingreeky/GMemory

SSH

git@github.com:bingreeky/GMemory.git

CLI

gh repo clone bingreeky/GMemory

AI Audio Lecture + Q&A

0:00 / 0:00

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Transcript

John: Alright, welcome to Advanced Topics in Multi-Agent Systems. Today's lecture is on G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems. We've seen a surge of work on agent memory recently, with papers like ReasoningBank focusing on distilling reasoning strategies and MemInsight on autonomous augmentation. This work, coming from researchers at institutions like NUS and NTU, argues that for multi-agent systems, or MAS, simply adapting single-agent memory isn't enough. It tackles the challenge of creating a collective, evolving memory. Yes, Noah? Noah: Excuse me, Professor. You mentioned this work argues against adapting single-agent memory. Is the core problem that current MAS frameworks just can't retain nuanced information from agent-to-agent interactions across different tasks? John: Precisely. The authors' central motivation is that existing MAS memory is overly simplistic. It often just stores the final outcome or highly condensed artifacts, losing the rich collaborative process. This prevents the system from truly learning from past experiences, essentially forcing it to solve similar problems from scratch each time. G-Memory proposes a solution inspired by organizational memory theory to enable genuine self-evolution in these systems. Noah: Organizational memory theory? So, it's modeling the MAS like a human team that learns over time? John: That's a good analogy. The core idea is a hierarchical, graph-based memory with three tiers. At the bottom, you have the Interaction Graph, which is a fine-grained log of every utterance between agents for a given task. It's the raw transcript of their collaboration. Above that is the Query Graph, which catalogues past tasks, their outcomes, and links them semantically. And at the very top is the Insight Graph, which abstracts generalizable principles and strategies from the lower-level interaction data. Noah: Wait, how are those abstract insights generated and stored? Is there a risk of over-generalizing and losing the specific context that made a strategy successful? John: That's a critical question, and it speaks to the system's operational workflow. When a new query arrives, the system doesn't just retrieve one thing. It performs a bi-directional traversal. It goes up from the relevant past queries to the Insight Graph to get high-level strategic advice. Simultaneously, it goes down to the Interaction Graphs of the most relevant past tasks. To avoid information overload, it uses an LLM-powered graph sparsifier to extract a concise 'core subgraph' of the collaboration. Noah: So it's getting both the forest and the trees. The high-level insight and the specific, condensed dialogue that led to a solution. How is this memory then provided to the agents? Is it just stuffed into the context prompt? John: Not indiscriminately. This is another key part of the design. The system provides 'specialized memory support' for each agent. An operator, which they call Phi, evaluates the retrieved insights and interaction patterns based on an agent's specific role. A 'Planner' agent might get a high-level strategic insight, while an 'Executor' agent might get a snippet of a past interaction showing a successful tool-use sequence. It's role-specific memory augmentation. Noah: And this whole graph structure is updated after every task? John: Correct. After the task concludes, feedback is used to update the entire hierarchy. A new query node is added, linked to its interaction graph. And importantly, a process is run to see if new generalizable insights can be distilled from the recent interaction and added to the Insight Graph. This continuous update cycle is what enables the system to learn and evolve its collective knowledge base. Noah: Quick question on the evaluation. The paper claims consistent improvement, but many memory systems do. Did they find anything that sets G-Memory apart in the results? John: They did. A very interesting finding was that several other memory baselines, including some designed for single agents like Voyager, actually degraded the performance of multi-agent frameworks in certain tasks. This suggests that simply adding memory isn't a guaranteed benefit; the memory must be structured for the specific needs of multi-agent collaboration. G-Memory, in contrast, showed significant gains across frameworks like AutoGen and MacNet, and did so with comparable or even lower token usage than some alternatives. Noah: So the hierarchical structure that separates high-level insights from low-level interactions is what prevents the context from getting noisy and counterproductive. The ablation study must have confirmed the importance of both levels, then. John: It did. Removing either the Insight Graph or the detailed Interaction Graph resulted in a performance drop, validating the multi-granularity approach. The work's main implication is that it provides a blueprint for moving beyond MAS with static, predefined workflows. It enables what the authors call the 'institutionalization of group knowledge,' allowing the agent team to become an experiential learner, which has significant potential for long-term robotics or complex scientific discovery tasks. John: So, to wrap up, G-Memory's contribution is a principled, hierarchical memory architecture designed specifically for the collaborative dynamics of multi-agent systems. It enables them to learn from the nuances of their interaction history. The authors are also careful to note the need for responsible deployment, as a memory system could potentially amplify biases from the underlying LLM. The key takeaway is that effective multi-agent learning requires more than just memory; it requires structured, role-aware, and evolving collective memory. John: Thanks for listening. If you have any further questions, ask our AI assistant or drop a comment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems