AI-Native Financial Data Foundation (25) – Reading Notes: Augmented LLMs, Tool Learning and GraphRAG – Data Ninjago (Financial Data Architecture)

Before jumping directly into the agentic semantic layer for the AI-native financial data foundation, I feel I need a bridge.

I want to share my reading notes on several key papers that helped me understand this direction and shape the high-level design of the agentic semantic layer.

In this post, I will focus on four papers. Three of them are survey-style papers, and one is the Microsoft GraphRAG paper. Together, they helped me understand the broader landscape of augmented LLMs, tool use, and GraphRAG. These ideas are pretty much at the core of what I mean by an agentic semantic layer.

Paper 1: Augmented Language Models: a Survey

Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., Grave, E., LeCun, Y., & Scialom, T. (2023). Augmented Language Models: a Survey. arXiv:2302.07842.

This paper starts by highlighting the core limitations of traditional LLMs:

limited internal knowledge
outdated information
hallucination
weak exact calculation
limited context window
weak long-horizon reasoning
no direct access to external systems

The main argument of this paper is that LLM systems should not depend only on the model’s parameters. Instead, the LLM should be augmented with external capabilities.

			
Augmented Language Model 
   = LLM + reasoning + retrieval + tools + action

Key idea 1: Reasoning is internal augmentation

Reasoning means the model decomposes a complex task into smaller steps. This includes methods such as Chain-of-Thought, recursive prompting, and task decomposition.

The important point is not just that the model “thinks step by step”. The deeper idea is that many real problems cannot be solved in one jump. They need intermediate reasoning.

However, reasoning alone is not enough. The model may still reason from wrong assumptions or hallucinated facts. So reasoning needs to be grounded by external evidence and tools.

Key idea 2: Tool use is external augmentation

Some tasks should not be solved by language generation alone. For example, exact calculation should be done by a calculator or code interpreter. Current factual lookup should be done through retrieval or search. Database questions should be answered through database queries.

The model’s role becomes:

			
understand the task
→ decide what is needed
→ call the right tool
→ interpret the result
→ answer

		

Key idea 3: Retrieval adds external memory

Retrieval allows the model to access external knowledge at inference time. This is the foundation of RAG.

Instead of expecting the model to remember everything, the system can retrieve relevant documents, passages, or data and pass them into the model context.

This helps reduce hallucination and improves factual grounding, but it also introduces new questions:

What should be retrieved?
How do we know the retrieved information is relevant?
How do we know it is authoritative?
How should retrieved evidence be presented to the model?

Key idea 4: Acting changes the role of the LLM

The paper also discusses action. Once an LLM can act, it is no longer only answering questions. It can interact with an environment.

A simple agent loop looks like this:

			
observe
→ reason
→ act
→ observe result
→ continue

		

This idea later becomes central to agent systems.

Paper 2: Tool Learning with Large Language Models: A Survey

Qu, C., Dai, S., Wei, X., Cai, H., Wang, S., Yin, D., Xu, J., & Wen, J.-R. (2024). Tool Learning with Large Language Models: A Survey. arXiv:2405.17935.

The first paper says LLMs need tools. This second paper asks a more detailed question:

How does an LLM actually learn to use tools?

This is important because tool use is often oversimplified. Many people think tool use means giving an LLM access to APIs. But the paper shows that tool use is much more complex.

A tool-using LLM needs to deal with:

tool representation
tool selection
tool invocation
tool output interpretation
tool planning
tool learning

Key idea 1: Tool representation

The model must know what tools exist and what each tool does. Tools can be represented through:

natural language descriptions
structured schemas
function signatures
API documentation
examples

Structured tool descriptions are usually more reliable than vague natural language descriptions because they reduce ambiguity.

The model needs to understand not only the tool name, but also:

what the tool is for
what input it expects
what output it returns
when it should be used
what limitations it has

Key idea 2: Tool selection

Tool selection means choosing the right tool for the task.

For example:

			
Question:
What is 234 × 872?
Bad approach:
answer from memory
Better approach:
call calculator

		

This sounds simple, but it becomes difficult when the model has many tools. With a few tools, the model can select manually. With hundreds or thousands of tools, tool selection becomes a discovery and ranking problem.

The model must decide:

Do I need a tool?
Which tool is best?
Is one tool enough?
Do I need multiple tools?
Should I retrieve first or calculate first?

Key idea 3: Tool invocation

Selecting the right tool is not enough. The model must call the tool correctly. This means producing the right:

function name
parameters
data types
formats
constraints

Many tool-use failures happen at this stage.

For example, the model may choose the correct database tool but generate invalid SQL. Or it may call the right API but pass the wrong parameter format.

So reliable tool use requires schema validation, argument checking, retry logic, and sometimes human approval.

Key idea 4: Tool output interpretation

After the tool returns a result, the model must understand the output. This includes:

reading structured output
filtering irrelevant information
detecting errors
understanding confidence
combining results with previous context
deciding whether another tool call is needed

This is why tool use is not a one-step operation. It is a loop.

			
think
→ call tool
→ observe result
→ think again
→ call next tool
→ answer

		

Key idea 5: Tool planning

Real tasks often require multiple tools.

For example:

			
understand the user question
→ retrieve relevant data
→ run a database query
→ validate the result
→ summarize the evidence
→ generate final answer

		

This turns tool use into workflow orchestration.

The model is not just calling tools randomly. It needs to plan a sequence of actions.

Key idea 6: How tool use is learned

The paper discusses several learning methods:

supervised learning
few-shot prompting
self-supervised learning
reinforcement learning
human feedback

Supervised learning is easier to control because the model learns from examples of correct tool use. Few-shot prompting is flexible because examples are included in the prompt. Self-supervised tool learning tries to generate tool-use examples automatically. Reinforcement learning can optimize tool behavior based on rewards, but it is harder to control and evaluate.

Paper 3: Microsoft GraphRAG

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130.

Traditional RAG usually works like this:

			
documents
→ chunks
→ embeddings
→ vector search
→ top-k chunks
→ LLM answer

		

This works well for local questions.

Examples:

What does this paragraph say?
Where is this term defined?
Who is mentioned in this document?

But the paper argues that many real questions are global questions.

Examples:

What are the main themes across this corpus?
How are these entities connected?
What risks appear repeatedly?
What is the overall pattern?

These questions cannot be answered well by retrieving a few similar chunks. The answer is distributed across many documents and relationships.

Key idea 1: Local retrieval is not enough

Naive RAG is mostly local. It retrieves chunks that are semantically similar to the query. But semantic similarity is not the same as structural understanding. A chunk may be similar to the query but not help answer the broader question. Also, top-k retrieval may miss important but less obvious evidence.

So the limitation is:

			
Traditional RAG finds similar text.
It does not naturally understand corpus structure.

Key idea 2: GraphRAG builds a knowledge graph

Microsoft GraphRAG changes the pipeline. Instead of only chunking and embedding documents, it extracts entities and relationships.

The pipeline is:

			
documents
→ entities
→ relationships
→ graph
→ communities
→ community summaries
→ query-focused answer

		

Example:

			
Text:
Company A acquired Company B.
Graph:
Company A → acquired → Company B

This converts unstructured text into structured knowledge.

Key idea 3: Entity and relationship extraction

The graph is created by extracting:

entities
relationships
claims
descriptions
source references

This is powerful because it makes relationships explicit.

However, it also creates risks:

missing entities
duplicate entities
wrong relationships
ambiguous names
over-general links
unsupported claims

The quality of GraphRAG depends heavily on graph quality.

			
bad extraction
→ bad graph
→ bad community summaries
→ bad answer

Key idea 4: Community detection

After the graph is built, the system detects communities. A community is a cluster of closely connected entities.

For example:

			
Community A:
climate, carbon, energy, emissions
Community B:
finance, banking, liquidity, credit

Communities help identify themes inside a large corpus. This is important because global questions often ask about themes, patterns, or repeated concerns.

Key idea 5: Community summarization

This is the key innovation of the paper. Each graph community is summarized by an LLM. So instead of retrieving thousands of nodes or many raw chunks, the system can retrieve compact community summaries.

			
large graph
→ communities
→ community summaries
→ compact context for the LLM

This makes corpus-level question answering more scalable. The system is not only retrieving information. It is pre-organizing the corpus into a semantic map.

Key idea 6: Global search

For global questions, GraphRAG retrieves relevant community summaries and generates partial answers. Then it combines those partial answers into a final response. This is similar to map-reduce:

			
Map:
generate partial answers from community summaries
Reduce:
combine partial answers into final answer

This helps improve comprehensiveness and diversity because the model can draw from different parts of the corpus.

Key idea 7: GraphRAG is not always better than ordinary RAG

This is important.

GraphRAG is most useful for:

global questions
cross-document synthesis
theme discovery
relationship reasoning
multi-hop connections
large corpus summarization

Ordinary RAG may still be enough for:

specific fact lookup
small document collections
local evidence retrieval
simple question answering

So GraphRAG should be seen as a complementary retrieval strategy, not a universal replacement for vector RAG.

Paper 4: GraphRAG for Customized Large Language Models

Zhang, Q., Chen, S., Bei, Y., Yuan, Z., Zhou, H., Hong, Z., Dong, J., Chen, H., Chang, Y., & Huang, X. (2025). A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv:2501.13958.

This paper is a survey. It does not propose one single method. Instead, it organizes the whole GraphRAG field.

The paper asks:

What are the main types of GraphRAG?
How does GraphRAG customize LLMs?
How should graph knowledge be organized, retrieved, and integrated?

Its value is that it gives a taxonomy.

Key idea 1: GraphRAG has three stages

The paper organizes GraphRAG into three stages:

knowledge organization
knowledge retrieval
knowledge integration

This is a useful framework.

Knowledge organization asks:

How should knowledge be represented?

Knowledge retrieval asks:

How should relevant graph evidence be found?

Knowledge integration asks:

How should retrieved graph evidence be given to the LLM?

Key idea 2: Knowledge organization

GraphRAG uses graphs to represent structured knowledge.

The graph may contain:

entities
relations
attributes
events
dependencies
concepts
documents
chunks
claims
metadata

This matters because domain knowledge is often relational.

In many professional domains, the important knowledge is not just isolated facts. It is the relationship between concepts.

Key idea 3: Knowledge-based GraphRAG

In knowledge-based GraphRAG, the graph itself carries the knowledge.

entity → relation → entity

Examples:

			
drug → treats → disease
company → owns → subsidiary
product → contains → component

This is useful when the domain has clear concepts and relationships. The graph becomes a structured knowledge base.

Key idea 4: Index-based GraphRAG

In index-based GraphRAG, the graph acts as an index over documents or chunks. The raw text remains the main evidence, but the graph helps navigate the corpus.

Examples:

			
chunk A → related to → chunk B
document X → cites → document Y
concept A → appears in → document Z

This is useful when we want to preserve raw source detail while still improving navigation.

Key idea 5: Hybrid GraphRAG

Hybrid GraphRAG combines both approaches. The graph carries structured knowledge, but it also links back to raw evidence.

			
structured graph knowledge
+ source documents
+ text chunks
+ tables
+ metadata
+ evidence links

		

This is usually the most practical architecture because it gives both structure and traceability.

The graph helps reasoning, while the raw source provides evidence.

Key idea 6: Knowledge retrieval

Graph retrieval can retrieve different types of evidence:

nodes
edges
triples
paths
subgraphs
communities
graph-indexed chunks

This is different from ordinary vector retrieval.

Vector retrieval mainly asks:

Which chunks are semantically similar?

Graph retrieval asks:

Which concepts, relationships, paths, or subgraphs are relevant?

Key idea 7: Expansion and pruning

Graph retrieval has a major challenge: expansion.

If we retrieve neighbours around one node, the graph can grow quickly.

			
1-hop: manageable
2-hop: large
3-hop: huge

So GraphRAG needs pruning. Pruning means selecting only the most useful graph evidence and removing irrelevant or noisy context.

A good GraphRAG system must balance:

			
expand enough to find useful relationships
prune enough to control context size

Key idea 7: Knowledge integration

After retrieval, the graph evidence must be integrated into the LLM context.

Possible formats include:

triples
paths
subgraphs
natural language summaries
structured evidence blocks
source-linked context

This matters because simply pasting many triples into a prompt may not help. The context must be organized so the model can reason over it.

Good integration should answer:

What is the relevant concept?
What are the relevant relationships?
What is the supporting evidence?
What source does it come from?
How should the model use it?

Key idea 8: GraphRAG as domain customization

The paper frames GraphRAG as a way to customize LLMs without only relying on fine-tuning.

Instead of changing model weights, we can provide domain knowledge at inference time.

			
general LLM
+ domain graph
+ domain retrieval
+ domain context
= customized LLM behavior

		

This is important because domain knowledge changes. External graph-based knowledge can be updated more easily than model parameters.

Summary

Across the four papers, a common architecture starts to emerge:

			
user question
→ LLM reasoning
→ task decomposition
→ tool / retrieval planning
→ choose retrieval mode or tool
→ retrieve / compute / observe
→ integrate evidence
→ generate grounded answer
→ validate / explain / act

		

This is very different from:

			
user question
→ LLM answer

The LLM becomes a reasoning and orchestration layer.

The four papers strongly suggest that an AI-native financial data foundation should not be designed as a simple chatbot over documents.

A better architecture is:

			
LLM
+ financial semantic graph
+ authoritative data sources
+ source lineage
+ validation rules
+ SQL / query tools
+ calculation tools
+ controlled workflow actions
+ audit trail

		

In this architecture:

			
LLM = reasoning and orchestration layer
tools = deterministic execution layer
graph = semantic organization layer
RAG = evidence retrieval layer
governance = trust and control layer

		

The key lesson is that the value is not in putting an LLM on top of messy data. The value is in building the semantic, graph, tool, and governance foundation that allows the LLM to reason safely and usefully.

*AI-Native Financial Data Foundation

Paper 1: Augmented Language Models: a Survey

Key idea 1: Reasoning is internal augmentation

Key idea 2: Tool use is external augmentation

Key idea 3: Retrieval adds external memory

Key idea 4: Acting changes the role of the LLM

Paper 2: Tool Learning with Large Language Models: A Survey

Key idea 1: Tool representation

Key idea 2: Tool selection

Key idea 3: Tool invocation

Key idea 4: Tool output interpretation

Key idea 5: Tool planning

Key idea 6: How tool use is learned

Paper 3: Microsoft GraphRAG

Key idea 1: Local retrieval is not enough

Key idea 2: GraphRAG builds a knowledge graph

Key idea 3: Entity and relationship extraction

Key idea 4: Community detection

Key idea 5: Community summarization

Key idea 6: Global search

Key idea 7: GraphRAG is not always better than ordinary RAG

Paper 4: GraphRAG for Customized Large Language Models

Key idea 1: GraphRAG has three stages

Key idea 2: Knowledge organization

Key idea 3: Knowledge-based GraphRAG

Key idea 4: Index-based GraphRAG

Key idea 5: Hybrid GraphRAG

Key idea 6: Knowledge retrieval

Key idea 7: Expansion and pruning

Key idea 7: Knowledge integration

Key idea 8: GraphRAG as domain customization

Summary

Share this:

Related

Leave a comment Cancel reply