Introduction Embarrassing Parallel refers to the problem where little or no effort is needed to separate the problem into parallel tasks, and there is no dependency for communication needed between the parallel tasks. Embarrassing parallel problem is very common with some typical examples like group-by analyses, simulations, optimisations, cross-validations or feature selections. Normally, an Embarrassing … Continue reading Handling Embarrassing Parallel Workload with Databricks Notebook Workflows
Category: Data Platform & Lakehouse
Execute R Scripts from Azure Data Factory (V2) through Azure Batch Service
Introduction One requirement I have been recently working with is to run R scripts for some complex calculations in an ADF (V2) data processing pipeline. My first attempt is to run the R scripts using Azure Data Lake Analytics (ADLA) with R extension. However, two limitations of ADLA R extension stopped me from adopting this … Continue reading Execute R Scripts from Azure Data Factory (V2) through Azure Batch Service
Build a Power BI Knowledge Base Bot Using Microsoft Bot Framework and QnA Maker
The first question pop out of my head when I heard Microsoft Bot Framework is how to build some sorts of bots, which is capable to replace me for any of my responsibilities in my work. Part of my Power BI consulting responsibilities is to answer all sorts of Power BI related questions from clients. … Continue reading Build a Power BI Knowledge Base Bot Using Microsoft Bot Framework and QnA Maker
SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2
*The source code created for this blog post can be found here. In the previous blog posts in the SSIS in Azure series, we created a SSIS package to periodically ingests data from Azure SQL database to Azure Data Lake Store and deployed the package in the Azure-SSIS Integrated Runtime. Up to this point, we have … Continue reading SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2
SSIS in Azure #2 – Deploy SSIS Packages to Azure-SSIS Integration Runtime in ADF V2
In the first blog post of the SSIS in Azure series, I gave a demonstration on how to create SSIS packages to move data in cloud, using a common use case that periodically ingests data from Azure SQL database to Azure Data Lake Store. In the pre-ADF V2 era, we can only deploy SSIS packages … Continue reading SSIS in Azure #2 – Deploy SSIS Packages to Azure-SSIS Integration Runtime in ADF V2
Anomaly Detection with Azure Stream Analytics
Anomaly detection is a very common use case in IoT related deployments. A new ANOMALYDETECTION operator has been recently added into Azure Stream Analytics and is currently at public preview. ANOMALYDETECTION operator detects anomalies based on Exchangeability Martingales (EM) that supports online test of the exchangeability of a sequence of event values. When the distribution of the sequence … Continue reading Anomaly Detection with Azure Stream Analytics
SSIS in Azure #1 – Periodically Ingesting Data from SQL Database into Azure Data Lake using SSIS
*The source code created for this blog post is located here. The low cost, schema-less and large column attributes of Azure Data Lake Store along with the large number of supported analytic engines (e.g., Azure Data Lake Analytics, Hive and Spark) makes it a prefect store-everything repository for enterprise data. We can offline the copies of business … Continue reading SSIS in Azure #1 – Periodically Ingesting Data from SQL Database into Azure Data Lake using SSIS
Azure Stream Analytics Patterns & Implementations
Thanks to the increased popularity of IoT and social networks, steaming analytics has become a hot topic and attracted more and more attentions in the data analytics community. Many people (e.g., this and this) believe streaming analytics is the future that will take over the use cases that are traditionally targeted by batch-oriented analytics. Azure … Continue reading Azure Stream Analytics Patterns & Implementations
End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 4)
This is the last part of the blog series demonstrating how to build an end-to-end ADF pipeline for data warehouse ELT. Introduction & Preparation Build ADF pipeline for dimension tables ELT Build ADLA U-SQL job for incremental extraction of machine cycle data Build ADF pipeline for fact table ELT In the previous part we created … Continue reading End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 4)
End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 3)
This is the third part of the blog series to demonstrate how to build an end-to-end ADF pipeline for data warehouse ELT. The part will describe how to build an ADLA U-SQL job for incremental extraction of machine cycle data from Azure Data Lake store and go through the steps for scheduling and triggering the … Continue reading End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 3)
You must be logged in to post a comment.