Feature extraction is an important step in IoT-related machine learning process that transforms the temporal data of machine component state into a format supported by machine learning algorithms. The extracted features need to be informative, i.e. need to carry the information that can contribute to the prediction.
Due to the temporal characteristic of IoT sensor data, there are some common patterns for extracting feature from IoT data. This blog post introduces three types of common feature extraction patterns for IoT data:
- Window-based descriptive statistics
- Seasonal pattern
- Trend pattern
- Window-based descriptive statistics
A piece of message from IoT sensor carries the information related to the state of a machine component at a time point. This single piece of information is meaningless for the machine learning prediction. However, the descriptive statistics of a sequence of senor data within a time window can offer valuable information for the prediction. For example, the count of exceptions occurring on a machine component in the last 7 days can be an indicator of potential failure of the machine in next 7 days.
The descriptive statistics can describe the distribution (e.g., skewness and kurtosis), central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, variance, and Range) of the senor measurements within a give time window. It is common that the combination of some descriptive statistics can provide more accurate information. For example, having both the mean and standard deviation of the sensor measurements in a time window will provide more accurate state information of a machine component compared to only having one of them.
Many window-based descriptive statistics can be the candidate features. Domain knowledge is always useful to judge which descriptive statistics within which size of window is more important than others to contribute for the prediction. For example, the engineers of an industrial machine will have more knowledge on which component under which condition is more likely to cause the machine failure.
- Seasonal pattern
IoT sensor data can show seasonal pattern. For example, the IoT data monitoring a machine usage can show a low usage level at weekends and a high usage level at weekdays. The features representing seasonal pattern can be extracted from the timestamp of the IoT sensor data. Some examples of the seasonal pattern features are:
- IsWorkingHour
- IsWeekday
- MonthOfYear
- DayOfWeek
These features can be very useful for time-series forecast type of machine learning requirements, e.g., machine usage forecast and energy consumption forecast.
- Trend pattern
Like seasonal pattern the features extracted from the trend pattern of the IoT sensor data can be useful for time-series forecast type of machine learning requirements as well. Lag operator can be used to extract the features for representing the trend pattern. A sequence of lagged values for the previous X units of time periods carry the autocorrelation information of the time series that can contribute to the prediction.