Tag: Data Engineering

S3 + Parquet + Iceberg + Trino: A Poor Man’s Market Data Platform

Before I start talking about how effective this architecture can be at reducing infrastructure costs, I should first make the old point that there is really no free lunch. Compared with commercial cloud data platforms and warehouses such as Databricks, BigQuery, and Snowflake, an open lakehouse setup requires significantly more engineering effort to build, operate, … Continue reading S3 + Parquet + Iceberg + Trino: A Poor Man’s Market Data Platform →

How QuantFlow Handles Large-Scale Market Data

For many years, a large portion of systematic strategies relied on relatively low-frequency signals. These approaches worked well when they were under-explored, but over time they have been widely researched, increasingly arbitraged, and structurally compressed in edge. As a result, a growing share of remaining opportunity has shifted toward market microstructure — order flow dynamics, … Continue reading How QuantFlow Handles Large-Scale Market Data →

QuantFlow Fun – Build a Low-Latency Feature Monitor Dashboard

One of the reasons — actually, the core reason — I chose DolphinDB as the built-in streaming engine for QuantFlow's streaming execution layer is that it's really fast, even for the kind of complicated computation that requires chained steps. Thanks to that speed, and with QuantFlow's MarketState engine and FeatureDAG compiler on top, we can … Continue reading QuantFlow Fun – Build a Low-Latency Feature Monitor Dashboard →

Setup QuantLib C++ Dev Environment with VS Code on Linux

Since I really don't want to go back to Windows, the bulky, messy headache, I decided to set up my QuantLib C++ development environment on Ubuntu. It took a few extra steps compared to setting up Visual Studio on Windows, so I’m sharing the process in this blog post in case it helps anyone. Step … Continue reading Setup QuantLib C++ Dev Environment with VS Code on Linux →

Robust DolphinDB – How does DolphinDB Achieve Scalability, Reliability, Resilience, Consistency, and Monitorability

What makes me buy into DolphinDB: Friendly DolphinDB – Cross-Exchange Arbitraging Case Speedy DolphinDB – Why is DolphinDB so fast? Robust DolphinDB – Reliable, Scalable, Resilient, Consistent, and Monitorable Cost Effective DolphinDB – Worth the Money DolphinDB – An Integrated Financial Data Platform, Not Just a Time-Series Database As a high-performance database built for business-critical financial applications … Continue reading Robust DolphinDB – How does DolphinDB Achieve Scalability, Reliability, Resilience, Consistency, and Monitorability →

Buy-Side Financial Data Engineering (3) – Market Data Management

Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) - Market Data Management As a data guy, two thoughts immediately come to my mind when I hear the term "Finance Market Data", 1) They are bloody expensive; 2) What a chore to handle all … Continue reading Buy-Side Financial Data Engineering (3) – Market Data Management →

Buy-Side Financial Data Models (2) – Financial Instruments

Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) – Market Data Management The second article of my "Buy-Side Financial Data Models" focuses on the "Financial Instruments" data domain. Financial instruments data is complex and difficult to manage. In the meantime, it is crucial to … Continue reading Buy-Side Financial Data Models (2) – Financial Instruments →

Buy-Side Financial Data Models (1) – Overview

Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) – Market Data Management This is the first blog post of the "Buy-Side Financial Data Models" series I am planning to write. To kick off this blog series, this post provides a high-level overview of the … Continue reading Buy-Side Financial Data Models (1) – Overview →

Spark SQL Query Engine Deep Dive (20) – Adaptive Query Execution (Part 2)

In the previous blog post, we looked into how the Adaptive Query Execution (AQE) framework is implemented in Spark SQL. This blog post introduces the two core AQE optimizer rules, the CoalesceShufflePartitoins rule and the OptimizeSkewedJoin rule, and how are implemented under the hood. I will not repeat what I have covered in the previous … Continue reading Spark SQL Query Engine Deep Dive (20) – Adaptive Query Execution (Part 2) →

Spark SQL Query Engine Deep Dive (19) – Adaptive Query Execution (Part 1)

Cost-based optimisation (CBO) is not a new thing. It has been widely used in the RDBMS world for many years. However, the use of CBO in a distributed, storage/computing separated system, such as Spark, is an "extremely complex problem" (claimed by Spark guys in Databricks). It is challenging and expensive to collect and maintain a … Continue reading Spark SQL Query Engine Deep Dive (19) – Adaptive Query Execution (Part 1) →

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: