Category: Machine Learning / Data Mining

Tuning Hyper-Parameters using Grid Search

Hyper-parameters tuning is one common but time-consuming task that aims to select the hyper-parameter values that maximise the accuracy of the model. Normally, cross validation is used to support hyper-parameters tuning that splits the data set to training set for learner training and the validation set to test the model. Python scikit-learn package provides the GridSearchCV class that can simplify the task for machine learning practitioners.

This blog post introduces how to use GridSeachCV class to tuning hyper-parameters using a predictive maintenance dataset as example.

Firstly, we need to import the required packages. Apart from the scikit-learn, we also need to import pandas for the data preprocessing, and LightGBM package for the GBDT model we are going to use as the model.

import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV

Then, we need to load and preprocessing the training data.

# read data
train = pd.read_csv('E:\Data\predicitivemaintance_processed.csv')

# preparing the predictor set and target set
train = train.drop(['Date', 'FailureDate'],axis=1)
target = 'FailNextWeek'
feature_categorical = ['Model']
train = pd.get_dummies(train, columns=feature_categorical)
X = train.drop(target, 1)
y = train[target]

Next, we create a GBDT model with a set of baseline parameters.

model = lgb.LGBMClassifier( 
    boosting_type="gbdt",
    is_unbalance=True, 
    random_state=10, 
    n_estimators=50,
    num_leaves=30, 
    max_depth=8,
    feature_fraction=0.5,  
    bagging_fraction=0.8, 
    bagging_freq=15, 
    learning_rate=0.01,    
)

Now, we move to the key part for the hyper-parameter tuning. In this example, we will tuning the hyper-parameters for ‘n_estimators’ and the ‘num_leaves’. We need to configure the range of parameter values we want to cover for each of them, and the grid search will cover a Cartesian product of those two set of parameter values.  For each parameter pair, in this example, 3-fold cross validation will be conducted with AUC used for scoring. In this example, 20 pairs of parameter will be evaluated and 3 fold for each of them. In total, 60 fits will be conducted.

params_opt = {'n_estimators':range(200, 600, 80), 'num_leaves':range(20,60,10)}
gridSearchCV = GridSearchCV(estimator = model, 
    param_grid = params_opt, 
    scoring='roc_auc',
    n_jobs=4,
    iid=False, 
    verbose=1,
    cv=3)
gridSearchCV.fit(X,y)
gridSearchCV.grid_scores_, gridSearchCV.best_params_, gridSearchCV.best_score_

After the code is run we can get the mean AUC value for each pair of parameter.

z1

As we can see the best hyper-parameter for ‘num_leaves’ is 30 and for ‘n_estimators’ is 360. We can then use those value to replace the baseline value used in the GBDT model and continue to tuning other hyper-parameters.

Extracting Features from IoT Sensor Data using R

In my previous blog I introduced the common patterns to extract features from IoT sensor data using Python. Although R is not my primary machine learning language it is becoming ubiquitous in Microsoft’s data analytics ecosystem after they acquired Revolution Analytics, the major commercial distributor of R. Considering the increasing popularity of R on Microsoft data platforms, I will create the R version of code for IoT data feature extraction in this blog.

This blog post is also organised based on the three common patterns for extracting feature from IoT sensor data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern

Also, the examples use the same IoT sample data that stores the hourly reading from sensor A.

a1

  1. Window-based descriptive statistics

We can use the rollapply function in the zoo library to calculate the descriptive statistics values in a rolling window. As there is no function for Skewness in the core R packages we have to use the e1071 library that contains the Skewness and Kurtosis function.

data <- data %>%
        mutate(SensorA_Mean_12h=rollapply(SensorA, width=12, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_12h=rollapply(SensorA, width=12, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_12h=rollapply(SensorA, width=12, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_24h=rollapply(SensorA, width=24, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_24h=rollapply(SensorA, width=24, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_24h=rollapply(SensorA, width=24, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_72h=rollapply(SensorA, width=72, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_72h=rollapply(SensorA, width=72, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_72h=rollapply(SensorA, width=72, FUN=skewness, by=1, fill=NA, align='right')
               ) 
tail(data, 5)

The code above will generate the following features:

a1

  1. Seasonal pattern

A date + time is represented in R as an object of class POSIXct. Once we convert the DateTime column into POSIXct, we can easily extract the parts of the datatime.

data$Date <- as.POSIXct(data$Date, "%Y-%m-%dT%H:%M:%S", tz="UTC")

data$DayOfWeek <- as.numeric(format(data$Date, "%u"))
data$IsWeekend <- ifelse (data$DayOfWeek>5, 1, 0)
data$Hour <- as.numeric(format(data$Date, "%H"))
data$IsWorkingHour <- ifelse (data$Hour>=9 & data$Hour<=17, 1, 0)
data$Year <- as.numeric(format(data$Date, "%Y"))
data$Month <- as.numeric(format(data$Date, "%m"))
data$DayOfMonth <- as.numeric(format(data$Date, "%d"))
tail(data, 5)

a2

  1. Trend pattern

In Python, we can use shift function to extract the features for representing the trend pattern in a time-series dataset. In R, a similar function is slide provided by DataCombine library.

data <- slide(data, Var = "SensorA", slideBy = -1:-7, 
      NewVar=c('SensorA_lag_1h', 'SensorA_lag_2h', 'SensorA_lag_3h', 'SensorA_lag_4h',
               'SensorA_lag_5h', 'SensorA_lag_6h', 'SensorA_lag_7h')
               )                                 
tail(data, 5)

We can the output as:
a3

Extracting Features from IoT Sensor Data using Python

The previous blog post discusses three common patterns for extracting feature from IoT sensor data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern

This blog post introduces how to implement those three patterns in Python.

  1. Window-based descriptive statistics

There are three main types of descriptive statistics based on what they describe: distribution (e.g., skewness and kurtosis), central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, variance, and Range). Python pandas package provides functions to a comprehensive list of descriptive statistics. You can find the reference to those functions here.

The descriptive statistics need to be calculated within a time window context, e.g., the last 12, 24, 72 hours. We can use the rolling method in pandas to get the rolling time window.

For example, we have the hourly reading data from sensor A:a1

We can get the rolling window sizing as 12, 24, 72 hours and calculate the mean, sd, and skew of each window size.

data['SensorA_mean_12h'] = data['SensorA'].rolling(12).mean()
data['SensorA_sd_12h'] = data['SensorA'].rolling(12).std()
data['SensorA_skew_12h'] = data['SensorA'].rolling(12).skew()
data['SensorA_mean_24h'] = data['SensorA'].rolling(24).mean()
data['SensorA_sd_24h'] = data['SensorA'].rolling(24).std()
data['SensorA_skew_24h'] = data['SensorA'].rolling(24).skew()
data['SensorA_mean_72h'] = data['SensorA'].rolling(72).mean()
data['SensorA_sd_72h'] = data['SensorA'].rolling(72).std()
data['SensorA_skew_72h'] = data['SensorA'].rolling(72).skew()
data.tails(5)

The python code above will generate the features as:

a1

  1. Seasonal pattern

As discussed in last blog post, the features representing seasonal pattern can be extracted from the timestamp of the IoT sensor data using the built-in Python datatime class, such as:

data['DayOfWeek']=data['DateTime'].dt.weekday
data['IsWeekend']=np.where(data['DateTime'].dt.weekday>4, 1, 0)
data['IsWorkingHour']=np.where((data['DateTime'].dt.hour>=9) & (data['DateTime'].dt.hour<=17), 1, 0)
data['Year']=data['DateTime'].dt.year
data['Month']=data['DateTime'].dt.month
data['DayOfMonth']=data['DateTime'].dt.day
data.tail(5)

We can get the output as:

a2

  1. Trend pattern

We can use shift function to extract the features for representing the trend pattern in a time-series dataset.

data['SensorA_lag_1h'] = data['SensorA'].shift(1)
data['SensorA_lag_2h'] = data['SensorA'].shift(2)
data['SensorA_lag_3h'] = data['SensorA'].shift(3)
data['SensorA_lag_4h'] = data['SensorA'].shift(4)
data['SensorA_lag_5h'] = data['SensorA'].shift(5)
data['SensorA_lag_6h'] = data['SensorA'].shift(6)
data['SensorA_lag_7h'] = data['SensorA'].shift(7)
data.tail(5)

We can the output as:
a3

Feature Extraction of IoT Sensor Data

Feature extraction is an important step in IoT-related machine learning process that transforms the temporal data of machine component state into a format supported by machine learning algorithms. The extracted features need to be informative, i.e. need to carry the information that can contribute to the prediction.

Due to the temporal characteristic of IoT sensor data, there are some common patterns for extracting feature from IoT data. This blog post introduces three types of common feature extraction patterns for IoT data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern
  1. Window-based descriptive statistics

A piece of message from IoT sensor carries the information related to the state of a machine component at a time point. This single piece of information is meaningless for the machine learning prediction. However, the descriptive statistics of a sequence of senor data within a time window can offer valuable information for the prediction. For example, the count of exceptions occurring on a machine component in the last 7 days can be an indicator of potential failure of the machine in next 7 days.

The descriptive statistics can describe the distribution (e.g., skewness and kurtosis), central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, variance, and Range) of the senor measurements within a give time window. It is common that the combination of some descriptive statistics can provide more accurate information. For example, having both the mean and standard deviation of the sensor measurements in a time window will provide more accurate state information of a machine component compared to only having one of them.

Many window-based descriptive statistics can be the candidate features. Domain knowledge is always useful to judge which descriptive statistics within which size of window is more important than others to contribute for the prediction. For example, the engineers of an industrial machine will have more knowledge on which component under which condition is more likely to cause the machine failure.

  1. Seasonal pattern

IoT sensor data can show seasonal pattern. For example, the IoT data monitoring a machine usage can show a low usage level at weekends and a high usage level at weekdays. The features representing seasonal pattern can be extracted from the timestamp of the IoT sensor data. Some examples of the seasonal pattern features are:

  • IsWorkingHour
  • IsWeekday
  • MonthOfYear
  • DayOfWeek

These features can be very useful for time-series forecast type of machine learning requirements, e.g., machine usage forecast and energy consumption forecast.

  1. Trend pattern

Like seasonal pattern the features extracted from the trend pattern of the IoT sensor data can be useful for time-series forecast type of machine learning requirements as well. Lag operator can be used to extract the features for representing the trend pattern. A sequence of lagged values for the previous X units of time periods carry the autocorrelation information of the time series that can contribute to the prediction.