Author: Linxiao Ma

End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 2)

This is the second part of the blog series to demonstrate how to build an end-to-end ADF pipeline for extracting data from Azure SQL DB/Azure Data Lake Store and loading to a star-schema data warehouse database with considerations on  SCD (slow changing dimensions) and incremental loading. Introduction & Preparation Build ADF pipeline for dimensional tables … Continue reading End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 2)

End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 1)

This blog series demonstrates how to build an end-to-end ADF pipeline for extracting data from Azure SQL DB/Azure Data Lake Store and load to a star-schema data warehouse database with considerations of  SCD (slow changing dimensions) and incremental loading. The final pipeline will look as: The machine cycle records will be load from the csv … Continue reading End-to-End Azure Data Factory Pipeline for Star Schema ETL (Part 1)

Things better to do When Working with Power BI

This is the third part in the blog series summarising my experience with Power BI from the real-world projects. Things Better to Know Before Implementing Power BI Service Pain Points of Power BI Things Better to Do When Working with Power BI Create separate QA/UAT app workspace from Dev app workspace Separate QA/UAT app workspace … Continue reading Things better to do When Working with Power BI

Pain Points of Power BI

This is the second post in the blog series summarising my experience with Power BI from the real-world projects. Things Better to Know Before Implementing Power BI Service Pain Points of Power BI Things Better to Do When Working with Power BI This blog post covers some very interesting topics, namely the pain points of … Continue reading Pain Points of Power BI

Things Better to Know Before Implementing Power BI Service

I love Power BI, sincerely, since as early as the PowerPivot 1.0 era. However, I must admit Power BI does drive me crazy sometimes. It can play very well when you follow the rules Power BI expecting you to follow, otherwise, it can cause you much pain. This blog series summarises my experience with Power … Continue reading Things Better to Know Before Implementing Power BI Service

R Visual – Building Component Cycle Timeline

One common approach to detect exceptions of a machine is to monitor the correlative status of components in the machine. For example, in normal condition, two or more components should be running at the same time, or some components should be running in sequential order. When the components are not running in the way as … Continue reading R Visual – Building Component Cycle Timeline

Power BI Fun – Ninjago Driver Scorecard for Fuel Saving

Just for fun, created the Ninjago driver scorecard dashboard for fuel saving. The idea behind this dashboard is actually from real-world IoT business cases for fuel saving through monitoring drivers' driving styles and behaviours.

Questions to Ask when Starting a Predictive Maintenance Project

One of the major use cases of industrial IoT is predictive maintenance that continuously monitors the condition and performance of equipment during normal operation and predict future equipment failure based on previous equipment failure and maintenance history. With an accurate equipment failure prediction organisations can reduce cost from unplanned breakdown and unnecessary preventive maintenance. Driven … Continue reading Questions to Ask when Starting a Predictive Maintenance Project

Evaluate Feature Importance using Tree-based Model

Tree-based model can be used to evaluate the importance of features. In this blog post I go through the steps of evaluating feature importance using the GBDT model in LightGBM. LightGBM is the gradient boosting framework released by Microsoft with high accuracy and speed (some test shows LightGBM can produce as accurate prediction as XGBoost … Continue reading Evaluate Feature Importance using Tree-based Model

Tuning Hyper-Parameters using Grid Search

Hyper-parameters tuning is one common but time-consuming task that aims to select the hyper-parameter values that maximise the accuracy of the model. Normally, cross validation is used to support hyper-parameters tuning that splits the data set to training set for learner training and the validation set to test the model. Python scikit-learn package provides the GridSearchCV class … Continue reading Tuning Hyper-Parameters using Grid Search