AI-Native Financial Data Foundation (1) – Why I Started this Blog Series, and What Happens to QuantFlow – Data Ninjago (Financial Data Architecture)

Since last year, my belief system around technology, skills, AI, and even the usefulness of myself has been changing. Not slowly evolving, but changing.

I have always seen myself as a highly rational and logical person. I do not easily get influenced by hype slogans such as ‘AI will replace everyone’. Even a few months ago, I was comfortable with the idea that AI could perform well at the syntax level. It could generate code, write configuration files, explain APIs, and automate repetitive engineering tasks. But I did not believe AI could work reliably at the semantic level, where understanding business context, domain knowledge, and financial meaning becomes essential.

However, I have started to think that AI can reason surprisingly well, sometimes even more broadly and consistently than humans, when it is given the right semantic context. One example comes from my own experience building QuantFlow. As I explained in my earlier blog post, “I Was Wrong: the Users of QuantFlow Won’t be Human“, I noticed that AI became more capable as QuantFlow accumulated more structure, metadata, and domain context.

My Thoughts on Semantic Context

But what is this semantic context?

Firstly, it is not simply a database schema. It is not just a collection of tables, columns, and data types. A physical schema can tell AI that a column is called fixed_rate, but it does not necessarily explain what role that rate plays in a financial product, how it relates to a payout, which party pays it, how the cashflow is calculated, or how it changes during the trade lifecycle.

For me, semantic context means a stable representation of financial meaning. It includes the canonical data model, business concepts, relationships between concepts, validation rules, modelling assumptions, lifecycle events, examples, lineage, and the mapping between logical financial objects and physical data structures.

In other words, semantic context sits between the physical data infrastructure and AI reasoning. The physical data infrastructure stores and processes the data. The semantic model explains what the data means. AI uses that semantic context to generate mappings, transformations, validations, features, workflows, and explanations.

This is why I no longer think the core problem is simply building more ETL pipelines, more feature libraries, or more research workflows. AI can increasingly generate those. The harder and more valuable problem is to build the stable financial representation that AI can use as its anchor.

Financial Data Foundation

I have been thinking about financial data foundations for quite a while, but mostly as something designed to support humans. I observe two main issues that block or slow down financial data projects or operations.

The first is the domain knowledge gap. Engineers often have good technical expertise but limited understanding of the underlying financial knowledge, especially in the FICC domain. This means they end up building solutions only at the technical level, which are not always useful to users.

I feel the biggest problem is the opportunity lost. Business users often cannot articulate what technology could make possible, while engineers do not have enough financial context to identify high-value opportunities. The gap is not only a communication problem; it is a semantic problem. As a result, expensive systems are built and maintained, but many opportunities to generate business value are never discovered in the first place.

The second is data consistency. Data from different vendors and systems uses different terms, structures, and conventions. For FICC, this is particularly painful. In my view, this is one of the root causes of the messiness found in many data management environments. Data becomes siloed across systems, teams, and business functions, with each source using its own representation of the same financial concepts. The result is duplication, inconsistency, reconciliation effort, and ongoing data quality issues.

I think both of these issues challenge both humans and AI. However, based on AI’s computing power for reasoning, I would say AI has a better chance of solving them when it is provided with more foundation and context.

For the financial data foundation, from my current view (this might change along with my blog writing journey), the foundation needs three core layers:

1. Canonical Financial Data Model

This is not just a set of relational tables. It is a semantic model that represents financial products, trades, positions, events, market data, counterparties, payouts, schedules, cashflows, and risk concepts with explicit financial meaning.

For example, an interest rate swap should not merely be represented as rows in several tables. The model should express that the product contains two interest rate payouts, that one side pays fixed, the other pays floating, and that each payout has its own schedule, rate specification, calculation logic, notional, currency, and payer/receiver roles.

This semantic structure is what allows AI to understand the product as a financial object, not just as data fields.

2. Mapping from Semantic Model to Physical Implementation

The canonical model cannot remain theoretical. It must be mappable to the physical world.

That means it should be possible to implement it in real data platforms: Parquet, Iceberg, BigQuery, Snowflake, Databricks, relational databases, document formats, or streaming systems.

The semantic model defines the meaning. The physical model defines how the data is stored, queried, processed, versioned, and governed.

Both are necessary. A model that is semantically beautiful but impossible to implement is not useful. A physical schema that is efficient but semantically poor will not support serious AI reasoning.

3. Scalable Data Infrastructure and Governance

The foundation also needs robust infrastructure.

Financial data is large, fast, messy, and highly contextual. The platform must support scalable storage, efficient processing, data quality control, observability, lineage, metadata management, and governance.

This is especially important for AI. If AI is going to generate transformations, mappings, validation rules, or analytical workflows, it needs a trusted execution environment. It needs to know what data exists, what each field means, where the data came from, whether it is complete, whether it is valid, and how it should be used.

However, the ultimate purpose of the data foundation is not simply to help AI trust the data. AI will happily generate answers from whatever data it is given. The real challenge is helping humans trust the AI. That trust comes from lineage, governance, data quality, reproducibility, and explainability. Therefore, one of the most important responsibilities of the data infrastructure is to act as the trust layer between AI reasoning and business users.

Why I Write This Blog Series

I have been thinking about financial data foundations for quite a while, but mostly in a scattered and ad-hoc way. One of the main motivations for writing this series is to help me organise my thoughts systematically.

I plan to write this series with extensive depth and width, covering both financial domains and engineering. This will be a good opportunity to structure my knowledge system.

Also, another reason for writing this series is to protect myself from myself. I genuinely believe foundation work is the right thing to focus on, but I also know it will be a long journey and can be boring and frustrating time by time. This series is partly my attempt to keep myself from some more exciting and glamorous stuffs.

In addition, I want this blog post to be the companion to my new version of QuantFlow.

What Happens to QuantFlow?

The original vision was closer to a quant research and data infrastructure platform. But now, based on the reasons discussed above, I need to redesign QuantFlow into a deliverable AI-native financial data foundation. That means some components should be retired, some should remain, and some new components need to be added.

Things to Retire

It is painful to throw away something you have built and used to think was valuable. However, the reality is that AI can do some things far better. To make myself feel a little less sad, I keep reminding myself that this is not only me. Many people have to face this similar situnations in the coming years.

Metadata-driven ETL
FeatureDAG and Feature Library
Research Engine

Things to Keep

DataInfra (the lakehouse design, storage architecture, data quality control, and observability)
Distributed computing module
State Engine (need to change into a financial state representation layer)

Things to Add

Canonical financial data model
Semantic layer and AI-readable metadata

Summary

The conclusion is simple. The future value of QuantFlow is probably not in building more predefined features, more YAML configurations, or more fixed research workflows. AI is becoming very good at generating those.

The real value is in building the foundation that allows AI to do this correctly: a platform that helps AI understand, generate, validate, and operate on financial data with real business meaning.

And this blog series is my attempt to think through that direction.

*AI-Native Financial Data Foundation