The previous blog post introduces a list of basic data quality rules that have been developed for my R&D data quality improvement initiative. Those rules are fundamental and essential for detecting data quality problems. However, those rules have existed since a long, long time ago and they are neither innovative nor exciting. More importantly, those … Continue reading dqops Data Quality Rules (Part 2) – CFD, Machine Learning
Tag: Machine Learning / Data Mining
Scaffolding Azure Machine Learning Experiments
*please download the source code here Microsoft has released the public preview of their newest data science service, Azure Machine Learning, that contains a collection of components to support the end-to-end machine learning solution. The Azure Machine Learning Workbench and the Azure Machine Learning Experimentation service are the two main components offered to machine learning practitioners … Continue reading Scaffolding Azure Machine Learning Experiments
Exploratory Data Analysis in Python
I have written a Jupyter notebook describing the Exploratory Data Analysis using Python as shown below:
Questions to Ask when Starting a Predictive Maintenance Project
One of the major use cases of industrial IoT is predictive maintenance that continuously monitors the condition and performance of equipment during normal operation and predict future equipment failure based on previous equipment failure and maintenance history. With an accurate equipment failure prediction organisations can reduce cost from unplanned breakdown and unnecessary preventive maintenance. Driven … Continue reading Questions to Ask when Starting a Predictive Maintenance Project
Evaluate Feature Importance using Tree-based Model
Tree-based model can be used to evaluate the importance of features. In this blog post I go through the steps of evaluating feature importance using the GBDT model in LightGBM. LightGBM is the gradient boosting framework released by Microsoft with high accuracy and speed (some test shows LightGBM can produce as accurate prediction as XGBoost … Continue reading Evaluate Feature Importance using Tree-based Model
Tuning Hyper-Parameters using Grid Search
Hyper-parameters tuning is one common but time-consuming task that aims to select the hyper-parameter values that maximise the accuracy of the model. Normally, cross validation is used to support hyper-parameters tuning that splits the data set to training set for learner training and the validation set to test the model. Python scikit-learn package provides the GridSearchCV class … Continue reading Tuning Hyper-Parameters using Grid Search
Extracting Features from IoT Sensor Data using R
In my previous blog I introduced the common patterns to extract features from IoT sensor data using Python. Although R is not my primary machine learning language it is becoming ubiquitous in Microsoft's data analytics ecosystem after they acquired Revolution Analytics, the major commercial distributor of R. Considering the increasing popularity of R on Microsoft … Continue reading Extracting Features from IoT Sensor Data using R
Extracting Features from IoT Sensor Data using Python
The previous blog post discusses three common patterns for extracting feature from IoT sensor data: Window-based descriptive statistics Seasonal pattern Trend pattern This blog post introduces how to implement those three patterns in Python. Window-based descriptive statistics There are three main types of descriptive statistics based on what they describe: distribution (e.g., skewness and kurtosis), … Continue reading Extracting Features from IoT Sensor Data using Python
Feature Extraction of IoT Sensor Data
Feature extraction is an important step in IoT-related machine learning process that transforms the temporal data of machine component state into a format supported by machine learning algorithms. The extracted features need to be informative, i.e. need to carry the information that can contribute to the prediction. Due to the temporal characteristic of IoT sensor … Continue reading Feature Extraction of IoT Sensor Data

You must be logged in to post a comment.