Before I start talking about how effective this architecture can be at reducing infrastructure costs, I should first make the old point that there is really no free lunch. Compared with commercial cloud data platforms and warehouses such as Databricks, BigQuery, and Snowflake, an open lakehouse setup requires significantly more engineering effort to build, operate, … Continue reading S3 + Parquet + Iceberg + Trino: A Poor Man’s Market Data Platform
Tag: Data Engineering
Buy-Side Financial Data Engineering (3) – Market Data Management
Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) - Market Data Management As a data guy, two thoughts immediately come to my mind when I hear the term "Finance Market Data", 1) They are bloody expensive; 2) What a chore to handle all … Continue reading Buy-Side Financial Data Engineering (3) – Market Data Management
Buy-Side Financial Data Models (2) – Financial Instruments
Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) – Market Data Management The second article of my "Buy-Side Financial Data Models" focuses on the "Financial Instruments" data domain. Financial instruments data is complex and difficult to manage. In the meantime, it is crucial to … Continue reading Buy-Side Financial Data Models (2) – Financial Instruments
Buy-Side Financial Data Models (1) – Overview
Buy-Side Financial Data Engineering (1) - Overview Buy-Side Financial Data Engineering (2) - Financial Instruments Buy-Side Financial Data Engineering (3) – Market Data Management This is the first blog post of the "Buy-Side Financial Data Models" series I am planning to write. To kick off this blog series, this post provides a high-level overview of the … Continue reading Buy-Side Financial Data Models (1) – Overview
Why I Prefer Hand-Coded Transformations over ADF Mapping Data Flow
Firstly, I need to clarify that what I am discussing in this blog post is only with ADF Mapping Data Flow instead of the whole ADF service. I am not going to challenge ADF’s role as the superb orchestration service in the Azure data ecosystem. In fact, I love ADF. At the control flow level, … Continue reading Why I Prefer Hand-Coded Transformations over ADF Mapping Data Flow
Configuration-Driven Azure Data Factory Pipelines
In this blog post, I will introduce two configuration-driven Azure Data Factory pipeline patterns I have used in my previous projects, including the Source-Sink pattern and the Key-Value pattern. The Source-Sink pattern is primarily used for parameterising and configuring the data movement activities, with the source location and sink location of the data movement configured in a … Continue reading Configuration-Driven Azure Data Factory Pipelines
Execute R Scripts from Azure Data Factory (V2) through Azure Batch Service
Introduction One requirement I have been recently working with is to run R scripts for some complex calculations in an ADF (V2) data processing pipeline. My first attempt is to run the R scripts using Azure Data Lake Analytics (ADLA) with R extension. However, two limitations of ADLA R extension stopped me from adopting this … Continue reading Execute R Scripts from Azure Data Factory (V2) through Azure Batch Service
The Tip for Installing R packages on Azure Batch
Problem In one project I have been recently working with, I need to execute R scripts in Azure Batch. The computer nodes of the Azure Batch pool were provisioned with Data Science Virtual Machines which already include common R packages. However, some packages required for the R scripts, such as tidyr and rAzureBatch, are missing … Continue reading The Tip for Installing R packages on Azure Batch
SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2
*The source code created for this blog post can be found here. In the previous blog posts in the SSIS in Azure series, we created a SSIS package to periodically ingests data from Azure SQL database to Azure Data Lake Store and deployed the package in the Azure-SSIS Integrated Runtime. Up to this point, we have … Continue reading SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2
SSIS in Azure #2 – Deploy SSIS Packages to Azure-SSIS Integration Runtime in ADF V2
In the first blog post of the SSIS in Azure series, I gave a demonstration on how to create SSIS packages to move data in cloud, using a common use case that periodically ingests data from Azure SQL database to Azure Data Lake Store. In the pre-ADF V2 era, we can only deploy SSIS packages … Continue reading SSIS in Azure #2 – Deploy SSIS Packages to Azure-SSIS Integration Runtime in ADF V2




You must be logged in to post a comment.