Before jumping directly into the agentic semantic layer for the AI-native financial data foundation, I feel I need a bridge.
I want to share my reading notes on several key papers that helped me understand this direction and shape the high-level design of the agentic semantic layer.
In this post, I will focus on four papers. Three of them are survey-style papers, and one is the Microsoft GraphRAG paper. Together, they helped me understand the broader landscape of augmented LLMs, tool use, and GraphRAG. These ideas are pretty much at the core of what I mean by an agentic semantic layer.
Paper 1: Augmented Language Models: a Survey
Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., Grave, E., LeCun, Y., & Scialom, T. (2023). Augmented Language Models: a Survey. arXiv:2302.07842.
This paper starts by highlighting the core limitations of traditional LLMs:
- limited internal knowledge
- outdated information
- hallucination
- weak exact calculation
- limited context window
- weak long-horizon reasoning
- no direct access to external systems
The main argument of this paper is that LLM systems should not depend only on the model’s parameters. Instead, the LLM should be augmented with external capabilities.
Augmented Language Model = LLM + reasoning + retrieval + tools + action
Key idea 1: Reasoning is internal augmentation
Reasoning means the model decomposes a complex task into smaller steps. This includes methods such as Chain-of-Thought, recursive prompting, and task decomposition.
The important point is not just that the model “thinks step by step”. The deeper idea is that many real problems cannot be solved in one jump. They need intermediate reasoning.
However, reasoning alone is not enough. The model may still reason from wrong assumptions or hallucinated facts. So reasoning needs to be grounded by external evidence and tools.
Key idea 2: Tool use is external augmentation
Some tasks should not be solved by language generation alone. For example, exact calculation should be done by a calculator or code interpreter. Current factual lookup should be done through retrieval or search. Database questions should be answered through database queries.
The model’s role becomes:
understand the task→ decide what is needed→ call the right tool→ interpret the result→ answer
Key idea 3: Retrieval adds external memory
Retrieval allows the model to access external knowledge at inference time. This is the foundation of RAG.
Instead of expecting the model to remember everything, the system can retrieve relevant documents, passages, or data and pass them into the model context.
This helps reduce hallucination and improves factual grounding, but it also introduces new questions:
- What should be retrieved?
- How do we know the retrieved information is relevant?
- How do we know it is authoritative?
- How should retrieved evidence be presented to the model?
Key idea 4: Acting changes the role of the LLM
The paper also discusses action. Once an LLM can act, it is no longer only answering questions. It can interact with an environment.
A simple agent loop looks like this:
observe→ reason→ act→ observe result→ continue
This idea later becomes central to agent systems.
Paper 2: Tool Learning with Large Language Models: A Survey
Qu, C., Dai, S., Wei, X., Cai, H., Wang, S., Yin, D., Xu, J., & Wen, J.-R. (2024). Tool Learning with Large Language Models: A Survey. arXiv:2405.17935.
The first paper says LLMs need tools. This second paper asks a more detailed question:
How does an LLM actually learn to use tools?
This is important because tool use is often oversimplified. Many people think tool use means giving an LLM access to APIs. But the paper shows that tool use is much more complex.
A tool-using LLM needs to deal with:
- tool representation
- tool selection
- tool invocation
- tool output interpretation
- tool planning
- tool learning
Key idea 1: Tool representation
The model must know what tools exist and what each tool does. Tools can be represented through:
- natural language descriptions
- structured schemas
- function signatures
- API documentation
- examples
Structured tool descriptions are usually more reliable than vague natural language descriptions because they reduce ambiguity.
The model needs to understand not only the tool name, but also:
- what the tool is for
- what input it expects
- what output it returns
- when it should be used
- what limitations it has
Key idea 2: Tool selection
Tool selection means choosing the right tool for the task.
For example:
Question:What is 234 × 872?Bad approach:answer from memoryBetter approach:call calculator
This sounds simple, but it becomes difficult when the model has many tools. With a few tools, the model can select manually. With hundreds or thousands of tools, tool selection becomes a discovery and ranking problem.
The model must decide:
- Do I need a tool?
- Which tool is best?
- Is one tool enough?
- Do I need multiple tools?
- Should I retrieve first or calculate first?
Key idea 3: Tool invocation
Selecting the right tool is not enough. The model must call the tool correctly. This means producing the right:
- function name
- parameters
- data types
- formats
- constraints
Many tool-use failures happen at this stage.
For example, the model may choose the correct database tool but generate invalid SQL. Or it may call the right API but pass the wrong parameter format.
So reliable tool use requires schema validation, argument checking, retry logic, and sometimes human approval.
Key idea 4: Tool output interpretation
After the tool returns a result, the model must understand the output. This includes:
- reading structured output
- filtering irrelevant information
- detecting errors
- understanding confidence
- combining results with previous context
- deciding whether another tool call is needed
This is why tool use is not a one-step operation. It is a loop.
think→ call tool→ observe result→ think again→ call next tool→ answer
Key idea 5: Tool planning
Real tasks often require multiple tools.
For example:
understand the user question→ retrieve relevant data→ run a database query→ validate the result→ summarize the evidence→ generate final answer
This turns tool use into workflow orchestration.
The model is not just calling tools randomly. It needs to plan a sequence of actions.
Key idea 6: How tool use is learned
The paper discusses several learning methods:
- supervised learning
- few-shot prompting
- self-supervised learning
- reinforcement learning
- human feedback
Supervised learning is easier to control because the model learns from examples of correct tool use. Few-shot prompting is flexible because examples are included in the prompt. Self-supervised tool learning tries to generate tool-use examples automatically. Reinforcement learning can optimize tool behavior based on rewards, but it is harder to control and evaluate.
Paper 3: Microsoft GraphRAG
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130.
Traditional RAG usually works like this:
documents→ chunks→ embeddings→ vector search→ top-k chunks→ LLM answer
This works well for local questions.
Examples:
- What does this paragraph say?
- Where is this term defined?
- Who is mentioned in this document?
But the paper argues that many real questions are global questions.
Examples:
- What are the main themes across this corpus?
- How are these entities connected?
- What risks appear repeatedly?
- What is the overall pattern?
These questions cannot be answered well by retrieving a few similar chunks. The answer is distributed across many documents and relationships.
Key idea 1: Local retrieval is not enough
Naive RAG is mostly local. It retrieves chunks that are semantically similar to the query. But semantic similarity is not the same as structural understanding. A chunk may be similar to the query but not help answer the broader question. Also, top-k retrieval may miss important but less obvious evidence.
So the limitation is:
Traditional RAG finds similar text.It does not naturally understand corpus structure.
Key idea 2: GraphRAG builds a knowledge graph
Microsoft GraphRAG changes the pipeline. Instead of only chunking and embedding documents, it extracts entities and relationships.
The pipeline is:
documents→ entities→ relationships→ graph→ communities→ community summaries→ query-focused answer
Example:
Text:Company A acquired Company B.Graph:Company A → acquired → Company B
This converts unstructured text into structured knowledge.
Key idea 3: Entity and relationship extraction
The graph is created by extracting:
- entities
- relationships
- claims
- descriptions
- source references
This is powerful because it makes relationships explicit.
However, it also creates risks:
- missing entities
- duplicate entities
- wrong relationships
- ambiguous names
- over-general links
- unsupported claims
The quality of GraphRAG depends heavily on graph quality.
bad extraction→ bad graph→ bad community summaries→ bad answer
Key idea 4: Community detection
After the graph is built, the system detects communities. A community is a cluster of closely connected entities.
For example:
Community A:climate, carbon, energy, emissionsCommunity B:finance, banking, liquidity, credit
Communities help identify themes inside a large corpus. This is important because global questions often ask about themes, patterns, or repeated concerns.
Key idea 5: Community summarization
This is the key innovation of the paper. Each graph community is summarized by an LLM. So instead of retrieving thousands of nodes or many raw chunks, the system can retrieve compact community summaries.
large graph→ communities→ community summaries→ compact context for the LLM
This makes corpus-level question answering more scalable. The system is not only retrieving information. It is pre-organizing the corpus into a semantic map.
Key idea 6: Global search
For global questions, GraphRAG retrieves relevant community summaries and generates partial answers. Then it combines those partial answers into a final response. This is similar to map-reduce:
Map:generate partial answers from community summariesReduce:combine partial answers into final answer
This helps improve comprehensiveness and diversity because the model can draw from different parts of the corpus.
Key idea 7: GraphRAG is not always better than ordinary RAG
This is important.
GraphRAG is most useful for:
- global questions
- cross-document synthesis
- theme discovery
- relationship reasoning
- multi-hop connections
- large corpus summarization
Ordinary RAG may still be enough for:
- specific fact lookup
- small document collections
- local evidence retrieval
- simple question answering
So GraphRAG should be seen as a complementary retrieval strategy, not a universal replacement for vector RAG.
Paper 4: GraphRAG for Customized Large Language Models
Zhang, Q., Chen, S., Bei, Y., Yuan, Z., Zhou, H., Hong, Z., Dong, J., Chen, H., Chang, Y., & Huang, X. (2025). A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv:2501.13958.
This paper is a survey. It does not propose one single method. Instead, it organizes the whole GraphRAG field.
The paper asks:
- What are the main types of GraphRAG?
- How does GraphRAG customize LLMs?
- How should graph knowledge be organized, retrieved, and integrated?
Its value is that it gives a taxonomy.
Key idea 1: GraphRAG has three stages
The paper organizes GraphRAG into three stages:
- knowledge organization
- knowledge retrieval
- knowledge integration
This is a useful framework.
Knowledge organization asks:
How should knowledge be represented?
Knowledge retrieval asks:
How should relevant graph evidence be found?
Knowledge integration asks:
How should retrieved graph evidence be given to the LLM?
Key idea 2: Knowledge organization
GraphRAG uses graphs to represent structured knowledge.
The graph may contain:
- entities
- relations
- attributes
- events
- dependencies
- concepts
- documents
- chunks
- claims
- metadata
This matters because domain knowledge is often relational.
In many professional domains, the important knowledge is not just isolated facts. It is the relationship between concepts.
Key idea 3: Knowledge-based GraphRAG
In knowledge-based GraphRAG, the graph itself carries the knowledge.
entity → relation → entity
Examples:
drug → treats → diseasecompany → owns → subsidiaryproduct → contains → component
This is useful when the domain has clear concepts and relationships. The graph becomes a structured knowledge base.
Key idea 4: Index-based GraphRAG
In index-based GraphRAG, the graph acts as an index over documents or chunks. The raw text remains the main evidence, but the graph helps navigate the corpus.
Examples:
chunk A → related to → chunk Bdocument X → cites → document Yconcept A → appears in → document Z
This is useful when we want to preserve raw source detail while still improving navigation.
Key idea 5: Hybrid GraphRAG
Hybrid GraphRAG combines both approaches. The graph carries structured knowledge, but it also links back to raw evidence.
structured graph knowledge+ source documents+ text chunks+ tables+ metadata+ evidence links
This is usually the most practical architecture because it gives both structure and traceability.
The graph helps reasoning, while the raw source provides evidence.
Key idea 6: Knowledge retrieval
Graph retrieval can retrieve different types of evidence:
- nodes
- edges
- triples
- paths
- subgraphs
- communities
- graph-indexed chunks
This is different from ordinary vector retrieval.
Vector retrieval mainly asks:
Which chunks are semantically similar?
Graph retrieval asks:
Which concepts, relationships, paths, or subgraphs are relevant?
Key idea 7: Expansion and pruning
Graph retrieval has a major challenge: expansion.
If we retrieve neighbours around one node, the graph can grow quickly.
1-hop: manageable2-hop: large3-hop: huge
So GraphRAG needs pruning. Pruning means selecting only the most useful graph evidence and removing irrelevant or noisy context.
A good GraphRAG system must balance:
expand enough to find useful relationshipsprune enough to control context size
Key idea 7: Knowledge integration
After retrieval, the graph evidence must be integrated into the LLM context.
Possible formats include:
- triples
- paths
- subgraphs
- natural language summaries
- structured evidence blocks
- source-linked context
This matters because simply pasting many triples into a prompt may not help. The context must be organized so the model can reason over it.
Good integration should answer:
- What is the relevant concept?
- What are the relevant relationships?
- What is the supporting evidence?
- What source does it come from?
- How should the model use it?
Key idea 8: GraphRAG as domain customization
The paper frames GraphRAG as a way to customize LLMs without only relying on fine-tuning.
Instead of changing model weights, we can provide domain knowledge at inference time.
general LLM+ domain graph+ domain retrieval+ domain context= customized LLM behavior
This is important because domain knowledge changes. External graph-based knowledge can be updated more easily than model parameters.
Summary
Across the four papers, a common architecture starts to emerge:
user question→ LLM reasoning→ task decomposition→ tool / retrieval planning→ choose retrieval mode or tool→ retrieve / compute / observe→ integrate evidence→ generate grounded answer→ validate / explain / act
This is very different from:
user question→ LLM answer
The LLM becomes a reasoning and orchestration layer.
The four papers strongly suggest that an AI-native financial data foundation should not be designed as a simple chatbot over documents.
A better architecture is:
LLM+ financial semantic graph+ authoritative data sources+ source lineage+ validation rules+ SQL / query tools+ calculation tools+ controlled workflow actions+ audit trail
In this architecture:
LLM = reasoning and orchestration layertools = deterministic execution layergraph = semantic organization layerRAG = evidence retrieval layergovernance = trust and control layer
The key lesson is that the value is not in putting an LLM on top of messy data. The value is in building the semantic, graph, tool, and governance foundation that allows the LLM to reason safely and usefully.