Quantflow – Research and Online Microstructure Breakout Alpha

Quantflow – Research and Online Microstructure Breakout Alpha

End-to-end walkthrough: detecting microstructure breakouts with multi-factor, multi-clock confirmation — from project setup through batch research to live streaming deployment.

1. Scenario

Assume you need to detect microstructure breakouts in BTCUSDT order flow — distinguishing institutionally-driven price level breakthroughs from noise. The strategy requires simultaneous confirmation across four signals (price velocity, volume surge, flow direction, and depth vacuum), each measured on its most informative bar clock.

With QuantFlow, you can quickly provision a project and define the MFP pack by picking features from the existing feature library. If a feature you need does not exist, add it as a custom FeatureType — all features are then generated and ready. From there, iterate: add or drop features, change parameters, swap normalizations, adjust label horizons — re-run in batch mode against historical data for immediate feedback. Review feature importance, backtest, refine, repeat. Once you have a validated model, switch the mode from batch to streaming. Same YAML definitions, no code changes. Research feeds production, not a rewrite.

2. Project Setup

Install QuantFlow and scaffold a project from the crypto template:

pip install quantflow
qf init breakout_alpha --template crypto
cd breakout_alpha

Once inside the project directory, open .local_config.yml and quantflow_project.yml to configure credentials, data engines, symbols, bar types, labels, and MFP packs.

.local_config.yml — Credentials and engine connections:
feed_provider_credentials:
- provider: cryptohftdata
key: "your-api-key-here"

engine:
- name: duckdb
database: ".data/db/quantflow.duckdb"

- name: dolphindb
host: "localhost"
port: 8848
database: "quantflow_db"
auth: password
key:
username: "admin"
password: "123456"

- name: kafka
host: "localhost:9092"

local_cache:
path: ".data/.local"
quantflow_project.yml

The template ships with sources, feed providers, and engine backends pre-configured — you can leave those as-is. The three sections you will actively work on are bars, labels, and MFP packs.

Bars — Define the bar types the State Engine will produce:

 state_engine:
force_refresh: false
micro_batch_size: 200000
bars:
- type: dollar
threshold: 50000
- type: imbalance
k: 10
- type: tick
count: 50
- type: run
window: 10
snapshots:
period_seconds: 0.5
depth_levels: 20
  • dollar ($50k threshold) — the default clock: volume-standardized, robust to varying trade intensity
  • imbalance (k=10) — for order flow features: samples when information asymmetry arrives
  • tick (50 trades) — the fastest clock: for execution features that need fresh data
  • run (10-tick window) — for liquidity vacuum detection: captures sequential same-direction trades

Labels — Two triple-barrier definitions at different horizons:

 label_engine:
historical_label_engine: polars
labels:
- name: triple_barrier_20_10bp
type: triple_barrier
parameters:
horizon: 20
upper_barrier: 0.001
lower_barrier: 0.001
vertical_barrier: 20
inputs:
close: close
high: high
low: low
dependencies:
- cdm_dollar_bars
bar_types: [dollar]

- name: triple_barrier_50_5bp
type: triple_barrier
parameters:
horizon: 50
upper_barrier: 0.0005
lower_barrier: 0.0005
vertical_barrier: 50
inputs:
close: close
high: high
low: low
dependencies:
- cdm_dollar_bars
bar_types: [dollar]

MFP Packs — Activate the microstructure_breakout pack:

feature_engine:
...

mfp_packs:
- name: microstructure_breakout
base: microstructure_breakout
bar: dollar_k_50000

Features that require a specific bar type will override it in the MFP pack definition.

3. The Microstructure Breakout MFP Pack

Now you will create the core of the strategy: the microstructure_breakout Market Feature Package. It bundles 15 features across all five dimensions, each on its most informative bar clock.

Create .definitions/mfp/microstructure_breakout.yml:

Top-Level Configuration

name: microstructure_breakout
description: >
Two-directional microstructure breakout detection. Each feature on its
native bar clock — OFI+cumulative delta on imbalance bars, liquidity
vacuum on run bars, spread/slippage on tick bars, all others on dollar bars.
pattern: breakout
horizon_type: intraday
mode: tick_to_bar
normalization:
warmup_bars: 100

tick_to_bar mode computes features at tick resolution — catching every microstructure event — but emits values only when a bar completes on the assigned clock. Tick-level fidelity, bar-level decisions.

Signal Features — Seven directional features form the breakout confirmation system:

Seven directional prediction features form the breakout confirmation system:

signal_features:
- name: breakout_strength
type: price_velocity_volume_ratio
params: { window: 50 }
inputs: [cdm_trade_enriched]
normalization:
method: rolling_zscore
window: 100
clip: [-5, 5]
output_type: scalar
bar_aggregation: max

- name: breakout_volume_spike
type: relative_volume
params: { window: 100 }
inputs: [cdm_trade_enriched]
normalization:
method: rolling_zscore
window: 100
clip: [0, 20]
output_type: scalar
bar_aggregation: max

- name: volatility_expansion
type: realized_volatility
params: { window: 20 }
inputs: [cdm_trade_enriched]
normalization:
method: minmax
clip: [0, 1]
output_type: scalar
bar_aggregation: last

- name: volatility_compression_score
type: realized_volatility_percentile
params: { short_window: 20, long_window: 300 }
inputs: [cdm_trade_enriched]
normalization:
method: minmax
clip: [0, 1]
output_type: scalar
bar_aggregation: last

- name: order_flow_imbalance
type: ofi
params: { decay: 0.95, levels: 5 }
inputs: [cdm_trade_enriched]
normalization:
method: rolling_zscore
window: 50
clip: [-5, 5]
output_type: scalar
bar_aggregation: mean
bar: imbalance_k_10
staleness:
ttl_ms: 5000
action: decay

- name: cumulative_delta
type: cumulative_volume_delta
params: { window: 200 }
inputs: [cdm_trade_enriched]
normalization:
method: rolling_zscore
window: 100
clip: [-5, 5]
output_type: scalar
bar_aggregation: mean
bar: imbalance_k_10
staleness:
ttl_ms: 5000
action: decay

- name: liquidity_vacuum_score
type: depth_change_rate
params: { window: 50, side_sensitive: true, levels: 5 }
inputs: [cdm_trade_enriched]
normalization:
method: minmax
clip: [-1, 1]
output_type: scalar
bar_aggregation: mean
bar: run_w_10
staleness:
ttl_ms: 3000
action: invalidate
  • breakout_strength (price_velocity_volume_ratio, window=50, dollar clock): Detects conviction-weighted price thrust — genuine breakouts move fast on real volume, not noise. Rolling z-score [-5, 5], bar aggregation: max.
  • breakout_volume_spike (relative_volume, window=100, dollar clock): Independent volume confirmation — institutional flow leaves a volume footprint. Asymmetric clip [0, 20] ignores low-volume noise. Bar aggregation: max.
  • volatility_expansion (realized_volatility, window=20, dollar clock): Breakouts are volatility events — short-window expansion confirms the regime shift is underway. Minmax [0, 1], bar aggregation: last.
  • volatility_compression_score (realized_volatility_percentile, short=20/long=300, dollar clock): Pre-breakout compression — low values signal a coiled, spring-loaded market ready to release. Minmax [0, 1].
  • order_flow_imbalance (ofi, decay=0.95, levels=5, imbalance_k_10 clock): Aggressive pressure at top of book — imbalance bars sample when information arrives. Rolling zscore [-5, 5], 5s staleness with decay.
  • cumulative_delta (cumulative_volume_delta, window=200, imbalance_k_10 clock): Net committed volume over 200 bars — confirms OFI with actual executed trades, not just quote changes. Rolling zscore [-5, 5], 5s staleness with decay.
  • liquidity_vacuum_score (depth_change_rate, window=50, side_sensitive, run_w_10 clock): Book thinning during breakout — sequential same-direction liquidity consumption. Minmax [-1, 1], 3s staleness with invalidate.

Quality Features — Two features assess signal reliability:

Two features assess whether the breakout signals are reliable enough to trade:

  • breakout_snr (signal_to_noise_ratio, window=50): Measures how clean the breakout_strength signal is relative to its noise floor. When SNR is high (above 3), the signal stands clearly above market noise — actionable. When low, defer.
  • breakout_sharpe (rolling_sharpe_ratio, window=100): Tracks rolling risk-adjusted return quality. A positive, stable Sharpe over 100 bars indicates predictive consistency; declining or negative suggests parameters may need tuning.

Both take breakout_strength as input — a cross-feature dependency within the pack.

Regime Features — Market context for gating:

  • volatility_regime (realized_volatility, window=300): Gates whether breakouts are tradeable in current conditions
  • liquidity_regime (spread_regime_indicator, window=100): Monitors spread conditions — tight spreads favor breakout trading
  • momentum_10t (rate_of_change, window=10): Captures pre-breakout trend direction

Stability — Post-breakout persistence check:

  • momentum_autocorr (autocorrelation, lag=10): Positive autocorrelation confirms the breakout is sticking; negative signals choppy mean-reversion. Takes momentum_10t as input.

Execution Features:

  • spread_bps: Entry cost monitoring — unusually wide spreads signal market maker uncertainty
  • slippage_proxy (cumulative_depth, levels=5, weighted=true): Depth availability for breakout entry sizing

Multi-Clock Architecture

The pack uses four bar clocks:

  • dollar_k_50000 (default): breakout_strength, breakout_volume_spike, volatility_expansion, volatility_compression_score, breakout_snr, breakout_sharpe, volatility_regime, liquidity_regime, momentum_10t, momentum_autocorr — volume-standardized baseline, robust to varying trade intensity
  • imbalance_k_10: order_flow_imbalance, cumulative_delta — bars that sample when new information arrives via order flow asymmetry, best resolution for flow-based signals
  • run_w_10: liquidity_vacuum_score — bars triggered by sequential same-direction trades, captures the micro-dynamics of liquidity consumption
  • tick_k_50: spread_bps, slippage_proxy — fastest clock, execution conditions change on every trade, 500ms staleness TTL with invalidate

Staleness contracts bridge the clocks. When the decision clock (dollar bar) fires:

  • OFI and cumulative_delta from imbalance bars: decay toward zero after 5s without a new bar (signal fades, doesn’t freeze)
  • Liquidity vacuum from run bars: invalidate after 3s (discard entirely)
  • Spread and slippage from tick bars: invalidate after 500ms (execution data must be fresh)

4. Creating a Custom FeatureType

One signal feature — cumulative_volume_delta — doesn’t ship with the library, so you’ll create it from scratch at .definitions/feature_types/signal/cumulative_volume_delta.yml:

name: cumulative_volume_delta
description: >
Cumulative volume delta: rolling sum of signed volume (buy - sell).
Positive = net buying pressure, Negative = net selling pressure.
category: order_flow
version: v1.0
dimension: signal
status: active

required_inputs:
- cdm_trade_enriched.buy_volume
- cdm_trade_enriched.sell_volume

output_column: cumulative_delta
output_description: Rolling sum of net volume (buy_volume - sell_volume)

parameters:
window:
type: integer
description: Rolling window size (bars)
required: false
default: 200
constraints:
min: 10
max: 10000

formula: "rolling_sum((buy_volume - sell_volume), window)"

Key fields explained:

  • required_inputs: Fully qualified column references — cdm_trade_enriched.buy_volume and cdm_trade_enriched.sell_volume. The FeatureDAG compiler resolves these to CDM tables from the State Engine.
  • parameters.window: Single configurable integer with constraints. Default 200, range [10, 10000].
  • formula: rolling_sum((buy_volume – sell_volume), window) — net signed volume over N bars. FeatureDAG’s AST compiler turns this into an IR DAG, then lowers to Polars (batch) or DolphinDB (streaming). Same formula, both backends.

All FeatureTypes follow this schema. The formula string supports 40+ built-in functions.

5. Running the Batch Pipeline via Dagster

Dagster provides asset lineage, run history, and per-stage retries. Start the UI from your project directory:

dagster dev -w dagster_workspace.yaml 

Open Dagit at http://localhost:3000. You’ll see the 5-stage asset graph:

ingest → dbt → state_engine → label_engine → feature_engine
Article content

Click Materialize All, specify the date range. Each stage runs in sequence:

  1. Ingest — downloads raw trades and LOB data from cryptohftdata, caches Parquet files locally
  2. dbt — staging models map raw columns to CDM schema (type casts, field mappings, venue prefixes), CDM models union across providers
  3. State Engine — Numba fused kernel processes events in micro-batches, producing per-type bar tables (cdm_dollar_bars, cdm_imbalance_bars, cdm_tick_bars, cdm_run_bars), enriched trades, LOB snapshots
  4. Label Engine — reads dollar bars, computes triple-barrier labels at both horizons
  5. Feature Engine — compiles all 15 features through the IR pipeline, resolves cross-feature dependencies (e.g., momentum_autocorr depends on momentum_10t), runs DAG on Polars
Article content

Via CLI

For quick CLI runs:

qf run --start-date 2026-04-15 --end-date 2026-04-20
qf run --engine feature --start-date 2026-04-15 --end-date 2026-04-20

6. Batch Results

After a successful run, open the DuckDB database. Tables produced:

  • breakout_alpha_cdm.cdm_trade_enriched — trades with L1 enrichment (mid, spread, micro-price, direction)
  • breakout_alpha_cdm.cdm_lob_snapshot — LOB snapshots every 500ms
  • breakout_alpha_cdm.cdm_dollar_bars — dollar bars at $50k threshold
  • breakout_alpha_cdm.cdm_imbalance_bars — imbalance bars at k=10
  • breakout_alpha_cdm.cdm_tick_bars — tick bars at 50 trades
  • breakout_alpha_cdm.cdm_run_bars — run bars at 10-tick window
  • breakout_alpha_cdm.cdm_labels — triple-barrier labels (both horizons)
  • breakout_alpha_feature.features — all 15 feature values, per bar clock

The breakout_alpha_feature.features table contains all 15 feature values keyed by (symbol, bar_clock, feature_name, feature_time). Join with cdm_labels on symbol and feature_time — features and labels are already time-aligned. From here you can train a model, run a backtest, or export to your ML pipeline.

Now the real work begins. Tweak feature parameters, swap normalizations, add or drop features, adjust label horizons — each change is a quick re-run away. Batch mode turns days of trial-and-error into minutes. When the backtest tells you the signal is ready, flip the switch to streaming.

7. Streaming to Production

The exact same YAML definitions now deploy to streaming. No changes — only the execution backend switches from Polars/DuckDB to DolphinDB/Kafka.

Prerequisites: DolphinDB must be running (Community Edition at dolphindb.com). You already configured the connection in .local_config.yml.

The crypto template includes binance_spot_streaming — a WebSocket connection to wss://stream.binance.com:9443. Raw Binance trade messages (s, p, q, m fields) map to the CDM schema via field mappings.

Deploy

qf run --mode streaming

Three stages now running inside DolphinDB:

  1. Ingest — WebSocket client connects to Binance, subscribes to trade.btcusdt (real-time trades) and depth20@100ms (top-20 LOB levels every 100ms). Each JSON message is field-mapped to CDM: Binance’s s → symbol, p → price (double), q → size, m → is_buyer_maker. A venue literal (“binance”) and processed_time timestamp are added. Events flow into DolphinDB stream tables.
  2. Process — State Engine consumes the raw stream, reconstructs the order book, enriches trades with L1 context (mid-price, spread, micro-price), and emits bars on all four clocks simultaneously. Snapshots fire every 500ms at 20 depth levels.
  3. Feature — Feature Engine listens to each bar clock independently, computes the 15 features, applies normalization (z-score, minmax), and publishes feature vectors to the Kafka sink.

8. Streaming Results

In addition to the unified features stream table and the flattened mfp_breakout_alpha stream table for the MFP pack, the cdm_trade_enriched, cdm_lob_snapshot, and all deployed bar tables are available in DolphinDB for ad-hoc queries or custom monitoring.

For a real-time monitoring dashboard, see our blog post “Build a Low-Latency Monitor Dashboard” which walks through connecting DolphinDB stream tables to Grafana.

Leave a comment