AI-Native Financial Data Foundation (2) – FICC Canonical Data Model

AI-Native Financial Data Foundation (2) – FICC Canonical Data Model

In the previous post, I talked about why my thinking around QuantFlow has changed. The short version is that I am starting to believe that the future users of many financial data platforms may not be human quants, analysts, or engineers directly. Increasingly, the real users may be AI agents.

However, there is one important condition. AI agents need semantic context. They need financial concepts to be represented in a form that is accessible, understandable, and consistent. Without that semantic foundation, AI may be able to generate mappings, transformations, SQL, and workflows following some patterns they previous learnt, but it cannot reliably understand the specific financial meaning behind the data.

To provide that semantic context, I believe the foundation is a canonical data model. This is why, for me, the journey of building an AI-native financial data foundation has to start from the canonical data model. More specifically, I want to start from the FICC canonical data model.

Why Start from the FICC Canonical Data Model

There are several reasons. The first reason is personal. FICC data modelling is the domain that has cost me the most effort, especially in the recent years. It is also the area where I probably have the most unfinished thoughts. At the same time, it is the area I find most intellectually interesting.

The second reason is usefulness. I can clearly see the value of having a well-defined FICC canonical data model. The value is not limited to AI. In fact, a good canonical model is already valuable in a completely human-native world. It helps organisations create a common understanding of products, trades, parties, market data, and lifecycle events. It reduces duplication, improves consistency, and makes integration significantly easier.

The third reason comes from my recent experience. Recently, I had the opportunity to research and design a canonical FICC product and trade data model aligned with ideas from the ISDA Common Domain Model (CDM). That experience convinced me that the ISDA CDM approach, which models products based on their underlying economic structure, leads to a more appropriate representation of FICC business meaning.

What Do I Mean by FICC Canonical Data Model?

I think it is useful to break the phrase into three parts: FICC, Canonical, and Data Model.

FICC

FICC stands for Fixed Income, Currencies, and Commodities. If you are curious why these three areas are grouped together in the first place, I personally guess the term emerged because these products share many common business, trading, risk, and operational characteristics, and therefore are often treated as a logical group distinct from Equities.

This grouping also makes sense from a data modelling perspective. The participants are often institutional organisations. The products are frequently contract-based rather than simple securities. The deals can be highly customised, the transaction sizes are often much larger, and the lifecycle management is significantly more complex. Although FICC originally emerged as a business grouping within investment banks, I believe the distinction is also highly relevant from a data modelling perspective.

For example, many equity products can be represented primarily as securities. A stock can often be described using a security identifier and a set of descriptive attributes. FICC products are different. Many FICC products are contractual structures whose meaning is derived from economic terms, cash flows, schedules, rights, obligations, and lifecycle events. Taking the Interest Rate Swap as example, it cannot be represented in a equity-like simply way, e.g., “Product Type = IRS”. The meaning of an IRS contract comes from the relationship between the fixed and floating cash flows, the notional, the index, the reset schedule, the payment schedule, and many other contractual terms. This structural nature is one of the reasons why FICC data modelling remains difficult, and why canonical models can deliver significant value.

Canonical

Canonical means standardised, common, and independent of individual source systems. In a large financial institution, the same business concept may appear in multiple systems with different names, structures, and levels of detail. One system may call something a trade, another may call it a deal, and a third may represent the same concept in a completely different way.

A canonical model acts as a common language. The goal is not to remove all differences between systems. The goal is to provide a stable semantic layer that allows those differences to be understood, mapped, and governed consistently.

Data Model

When I say data model, I do not simply mean a database schema, but instead refer to the all three layers meanings: the conceptual model, the logical model, and the physical model.

The conceptual model describes the business concepts and their meaning. The logical model describes how those concepts relate to each other. The physical model describes how they are implemented in a specific technology platform.

A FICC canonical data model should exist across all three levels. More importantly, it should represent financial meaning rather than simply defining tables and columns. It should capture products, trades, parties, observables, cash flows, schedules, lifecycle events, and the relationships between them.

Why Do We Need a Canonical Data Model?

Because financial data is fragmented. Different desks use different systems. Different vendors use different representations. Different applications optimise for different purposes. As a result, the same business reality is represented in many different ways across the organisation.

This causes real pain for organisations. One example is knowledge silos and duplication. Teams repeatedly solve the same mapping and modelling problems because there is no common representation of the underlying business concepts.

Another problem is the lack of visibility. Data becomes scattered across applications, departments, and vendors, and nobody has a complete view of the organisation’s financial data landscape.

The third problem is opportunity cost. Eventually the environment becomes so complex that nobody wants to touch it. New initiatives become expensive because understanding the existing systems becomes more difficult than building the new solution itself.

A canonical model does not eliminate complexity, but it provides a place where that complexity can be organised and understood.

Why Is FICC Modelling Difficult?

As mentioned above, FICC products are highly customised. Unlike many equity instruments, products within the same category can have significantly different structures and contractual terms.

FICC products are also highly composable. Many products are constructed from smaller economic building blocks. The final product is often the result of combining multiple contractual components together.

In addition, the domain is full of specialised concepts such as notional, coupon, spread, fixing, reset, settlement, accrual, and day count conventions. These concepts carry significant business meaning and cannot simply be treated as generic attributes.

Finally, trades are not static. Lifecycle events continuously change the state and meaning of a trade. A trade can be amended, novated, compressed, terminated, corrected, or cancelled. A canonical model therefore needs to represent not only what a trade is, but also how it evolves over time.

What Approaches Exist Today?

This section is just based on my own observations and previous researches.

Some organisations use source-system-centric models and simply adopt the representation provided by a major trading or risk platform. This is often practical, but it can create long-term dependency on a specific application.

The more common approach is a reporting-centric models, such as wide reporting tables or dimensional models specifically designed for some reporting purposes. These approaches work well for some certain analytics and reporting, but they often compress or simplify the underlying business meaning.

Some organisations adopt vendor-provided models, such as security master platforms. These can provide useful standardisation, but they are naturally influenced by the vendor’s own modelling assumptions.

The last but not least approach is to create data models based on industry standard, such as ISDA CDM. This is my current preferred way. it not only provides a strong semantic foundation for thinking about products, trades, economic terms, lifecycle events aligned with their financial meaning, but also the opportunties by using a industry standard.

Next

In the rest of this series, I will start by analysing FICC financial concepts from the perspective of their underlying nature and structure. I actually made a similar attempt a few years ago. You can find it in my old article, Buy-Side Financial Data Models (2) – Financial Instruments. Looking back, I now consider that attempt largely unsuccessful. The main reason is that the classification was primarily based on product labels rather than the underlying economic nature of the products themselves.

This time, I want to take a different approach. Instead of starting with product categories, I want to focus on the underlying nature and structure of financial products, using the ISDA Common Domain Model (CDM) as the primary reference.

In the next post, I want to start with a concept that sounds simple, but turns out to be not that simple once you start thinking about it carefully:

What is a financial product at all?

Leave a comment