Tag: Feature Engineering

Evaluate Feature Importance using Tree-based Model

Tree-based model can be used to evaluate the importance of features. In this blog post I go through the steps of evaluating feature importance using the GBDT model in LightGBM. LightGBM is the gradient boosting framework released by Microsoft with high accuracy and speed (some test shows LightGBM can produce as accurate prediction as XGBoost but can reach 25x faster).

Firstly, we import the required packages: pandas for the data preprocessing, LightGBM for the GBDT model, and matplotlib for build the feature importance bar chart.

import pandas as pd
import matplotlib.pylab as plt
import lightgbm as lgb

Then, we need to load and preprocessing the training data. In this example, we use a predictive maintenance dataset.

# read data
train = pd.read_csv('E:\Data\predicitivemaintance_processed.csv')

# drop the columns that are not used for the model
train = train.drop(['Date', 'FailureDate'],axis=1)

# set the target column
target = 'FailNextWeek'

# One-hot encoding
feature_categorical = ['Model']
train = pd.get_dummies(train, columns=feature_categorical)

Next, we train the GBDT model with the training data

lgb_params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'num_leaves': 30,
    'num_round': 360,
    'max_depth':8,
    'learning_rate': 0.01,
    'feature_fraction': 0.5,
    'bagging_fraction': 0.8,
    'bagging_freq': 12
}
lgb_train = lgb.Dataset(train.drop(target, 1), train[target])
model = lgb.train(lgb_params, lgb_train)

After the model is trained, we can then call the plot_importance function of the trained model to get the importance of the features.

plt.figure(figsize=(12,6))
lgb.plot_importance(model, max_num_features=30)
plt.title("Feature importances")
plt.show()

s1

Extracting Features from IoT Sensor Data using R

In my previous blog I introduced the common patterns to extract features from IoT sensor data using Python. Although R is not my primary machine learning language it is becoming ubiquitous in Microsoft’s data analytics ecosystem after they acquired Revolution Analytics, the major commercial distributor of R. Considering the increasing popularity of R on Microsoft data platforms, I will create the R version of code for IoT data feature extraction in this blog.

This blog post is also organised based on the three common patterns for extracting feature from IoT sensor data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern

Also, the examples use the same IoT sample data that stores the hourly reading from sensor A.

a1

  1. Window-based descriptive statistics

We can use the rollapply function in the zoo library to calculate the descriptive statistics values in a rolling window. As there is no function for Skewness in the core R packages we have to use the e1071 library that contains the Skewness and Kurtosis function.

data <- data %>%
        mutate(SensorA_Mean_12h=rollapply(SensorA, width=12, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_12h=rollapply(SensorA, width=12, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_12h=rollapply(SensorA, width=12, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_24h=rollapply(SensorA, width=24, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_24h=rollapply(SensorA, width=24, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_24h=rollapply(SensorA, width=24, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_72h=rollapply(SensorA, width=72, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_72h=rollapply(SensorA, width=72, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_72h=rollapply(SensorA, width=72, FUN=skewness, by=1, fill=NA, align='right')
               ) 
tail(data, 5)

The code above will generate the following features:

a1

  1. Seasonal pattern

A date + time is represented in R as an object of class POSIXct. Once we convert the DateTime column into POSIXct, we can easily extract the parts of the datatime.

data$Date <- as.POSIXct(data$Date, "%Y-%m-%dT%H:%M:%S", tz="UTC")

data$DayOfWeek <- as.numeric(format(data$Date, "%u"))
data$IsWeekend <- ifelse (data$DayOfWeek>5, 1, 0)
data$Hour <- as.numeric(format(data$Date, "%H"))
data$IsWorkingHour <- ifelse (data$Hour>=9 & data$Hour<=17, 1, 0)
data$Year <- as.numeric(format(data$Date, "%Y"))
data$Month <- as.numeric(format(data$Date, "%m"))
data$DayOfMonth <- as.numeric(format(data$Date, "%d"))
tail(data, 5)

a2

  1. Trend pattern

In Python, we can use shift function to extract the features for representing the trend pattern in a time-series dataset. In R, a similar function is slide provided by DataCombine library.

data <- slide(data, Var = "SensorA", slideBy = -1:-7, 
      NewVar=c('SensorA_lag_1h', 'SensorA_lag_2h', 'SensorA_lag_3h', 'SensorA_lag_4h',
               'SensorA_lag_5h', 'SensorA_lag_6h', 'SensorA_lag_7h')
               )                                 
tail(data, 5)

We can the output as:
a3

Extracting Features from IoT Sensor Data using Python

The previous blog post discusses three common patterns for extracting feature from IoT sensor data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern

This blog post introduces how to implement those three patterns in Python.

  1. Window-based descriptive statistics

There are three main types of descriptive statistics based on what they describe: distribution (e.g., skewness and kurtosis), central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, variance, and Range). Python pandas package provides functions to a comprehensive list of descriptive statistics. You can find the reference to those functions here.

The descriptive statistics need to be calculated within a time window context, e.g., the last 12, 24, 72 hours. We can use the rolling method in pandas to get the rolling time window.

For example, we have the hourly reading data from sensor A:a1

We can get the rolling window sizing as 12, 24, 72 hours and calculate the mean, sd, and skew of each window size.

data['SensorA_mean_12h'] = data['SensorA'].rolling(12).mean()
data['SensorA_sd_12h'] = data['SensorA'].rolling(12).std()
data['SensorA_skew_12h'] = data['SensorA'].rolling(12).skew()
data['SensorA_mean_24h'] = data['SensorA'].rolling(24).mean()
data['SensorA_sd_24h'] = data['SensorA'].rolling(24).std()
data['SensorA_skew_24h'] = data['SensorA'].rolling(24).skew()
data['SensorA_mean_72h'] = data['SensorA'].rolling(72).mean()
data['SensorA_sd_72h'] = data['SensorA'].rolling(72).std()
data['SensorA_skew_72h'] = data['SensorA'].rolling(72).skew()
data.tails(5)

The python code above will generate the features as:

a1

  1. Seasonal pattern

As discussed in last blog post, the features representing seasonal pattern can be extracted from the timestamp of the IoT sensor data using the built-in Python datatime class, such as:

data['DayOfWeek']=data['DateTime'].dt.weekday
data['IsWeekend']=np.where(data['DateTime'].dt.weekday>4, 1, 0)
data['IsWorkingHour']=np.where((data['DateTime'].dt.hour>=9) & (data['DateTime'].dt.hour<=17), 1, 0)
data['Year']=data['DateTime'].dt.year
data['Month']=data['DateTime'].dt.month
data['DayOfMonth']=data['DateTime'].dt.day
data.tail(5)

We can get the output as:

a2

  1. Trend pattern

We can use shift function to extract the features for representing the trend pattern in a time-series dataset.

data['SensorA_lag_1h'] = data['SensorA'].shift(1)
data['SensorA_lag_2h'] = data['SensorA'].shift(2)
data['SensorA_lag_3h'] = data['SensorA'].shift(3)
data['SensorA_lag_4h'] = data['SensorA'].shift(4)
data['SensorA_lag_5h'] = data['SensorA'].shift(5)
data['SensorA_lag_6h'] = data['SensorA'].shift(6)
data['SensorA_lag_7h'] = data['SensorA'].shift(7)
data.tail(5)

We can the output as:
a3

Feature Extraction of IoT Sensor Data

Feature extraction is an important step in IoT-related machine learning process that transforms the temporal data of machine component state into a format supported by machine learning algorithms. The extracted features need to be informative, i.e. need to carry the information that can contribute to the prediction.

Due to the temporal characteristic of IoT sensor data, there are some common patterns for extracting feature from IoT data. This blog post introduces three types of common feature extraction patterns for IoT data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern
  1. Window-based descriptive statistics

A piece of message from IoT sensor carries the information related to the state of a machine component at a time point. This single piece of information is meaningless for the machine learning prediction. However, the descriptive statistics of a sequence of senor data within a time window can offer valuable information for the prediction. For example, the count of exceptions occurring on a machine component in the last 7 days can be an indicator of potential failure of the machine in next 7 days.

The descriptive statistics can describe the distribution (e.g., skewness and kurtosis), central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, variance, and Range) of the senor measurements within a give time window. It is common that the combination of some descriptive statistics can provide more accurate information. For example, having both the mean and standard deviation of the sensor measurements in a time window will provide more accurate state information of a machine component compared to only having one of them.

Many window-based descriptive statistics can be the candidate features. Domain knowledge is always useful to judge which descriptive statistics within which size of window is more important than others to contribute for the prediction. For example, the engineers of an industrial machine will have more knowledge on which component under which condition is more likely to cause the machine failure.

  1. Seasonal pattern

IoT sensor data can show seasonal pattern. For example, the IoT data monitoring a machine usage can show a low usage level at weekends and a high usage level at weekdays. The features representing seasonal pattern can be extracted from the timestamp of the IoT sensor data. Some examples of the seasonal pattern features are:

  • IsWorkingHour
  • IsWeekday
  • MonthOfYear
  • DayOfWeek

These features can be very useful for time-series forecast type of machine learning requirements, e.g., machine usage forecast and energy consumption forecast.

  1. Trend pattern

Like seasonal pattern the features extracted from the trend pattern of the IoT sensor data can be useful for time-series forecast type of machine learning requirements as well. Lag operator can be used to extract the features for representing the trend pattern. A sequence of lagged values for the previous X units of time periods carry the autocorrelation information of the time series that can contribute to the prediction.