Tag: Azure Stream Analytics

Anomaly Detection with Azure Stream Analytics

Anomaly detection is a very common use case in IoT related deployments. A new ANOMALYDETECTION operator has been recently added into Azure Stream Analytics and is currently at public preview.

ANOMALYDETECTION operator detects anomalies based on Exchangeability Martingales (EM) that supports online test of the exchangeability of a sequence of event values. When the distribution of the sequence of event values is invariant, this sequence of event values is exchangeable. If the distribution of the sequence of event values is changed, a potential anomaly occurs.

This is the syntax of ANOMALYDETECTION operator, which check whether the current event value is anomaly against a sliding window of time period defined by the OVER clause.

ANOMALYDETECTION(\) OVER ([PARTITION BY \] LIMIT DURATION(\, \) [WHEN boolean_expression])

ANOMALYDETECTION operator returns three scores (BiLevelChangeScore, SlowPosTrendScore, and SlowNegTrendScore) corresponding to the three types of anomalies:

  • Bidirectional level change
  • Slow positive trend
  • Slow negative trend

This blog post gives a demo on the ANOMALYDETECTION operator with an example that detect anomalies in a temperature sensor events flow. The sensor data will be generated by the Raspberry Pi Azure Iot Online Simulator , and sent to the IoT hub.

2

An Azure Stream Analytics input will be created to consume the temperature data from the IoT hub, and a Power BI output will be created to output the temperature anomaly alerts.

4

A prerequisite for ANOMALYDETECTION operator to work is that the input time series needs to be uniform. We can use tumbling window to uniform the time series by averaging the temperature within n seconds window.

1t1

1t1

To fill the window with no sensor data flowing in, we can use the last window where sensor data is available.

2

2

We can then use the ANOMALYDETECTION operator to compute the anomaly scores within the time window of last n minutes/hours and extract the BiLevelChangeSocore, SlowPosTrendScore, and SlowNegTrendScore.

3

Finally, we can check the scores against the threshold set for alert. The recommended range of the threshold from Microsoft is between 3.25 and 5.

The full code can be found here:

WITH AggregationStep AS
(
SELECT
System.Timestamp as tumblingWindowEnd,
AVG(temperature) as avgTemperature
FROM TemperatureSensor
GROUP BY TumblingWindow(second, 5)
),
FillInMissingValuesStep AS
(
SELECT
TopOne() OVER (ORDER BY tumblingWindowEnd DESC) AS lastEvent
FROM AggregationStep
GROUP BY HOPPINGWINDOW(second, 300, 5)
),
AnomalyDetectionStep AS
(
SELECT
lastEvent.tumblingWindowEnd as anomalyTime,
system.timestamp as anomalyDetectedTime,
lastEvent.avgTemperature as avgTemperature,
ANOMALYDETECTION(lastEvent.avgTemperature) OVER (LIMIT DURATION(minute, 10)) as scores
FROM FillInMissingValuesStep
),
OutputSet AS
(
SELECT
anomalyTime,
anomalyDetectedTime,
avgTemperature,
CAST(GetRecordPropertyValue(scores, 'BiLevelChangeScore') as float) as [Bi Level Change],
CAST(GetRecordPropertyValue(scores, 'SlowPosTrendScore') as float) as [Slow Postive Trend],
CAST(GetRecordPropertyValue(scores, 'SlowNegTrendScore') as float) as [Slow Negative Trend]
FROM AnomalyDetectionStep
)
SELECT
anomalyTime,
anomalyDetectedTime,
avgTemperature,
[Bi Level Change],
[Slow Postive Trend],
[Slow Negative Trend],
CASE
WHEN [Bi Level Change]>3.25 AND [Bi Level Change]> [Slow Postive Trend] AND [Bi Level Change]> [Slow Negative Trend]
THEN 'Bi Level Change'
WHEN [Slow Postive Trend]>3.25 AND [Bi Level Change]< [Slow Postive Trend] AND [Slow Postive Trend]> [Slow Negative Trend]
THEN 'Slow Postive Trend'
WHEN [Slow Postive Trend]>3.25 AND [Bi Level Change]< [Slow Negative Trend] AND [Slow Postive Trend]< [Slow Negative Trend]
THEN 'Slow Negative Trend'
ELSE ''
END AS anomalyType
INTO TelemetryAlert
FROM OutputSet

https://gist.github.com/malinxiao/f9e00fc6e805bc48adef947815907da1#file-anomalydetection-asaql

After start the stream analytics job, the temperature measure data with the anomaly scores will flow into Power BI.
1t1.PNG

We can create anomalies through changing the temperature value generated by the simulator.

3

1t1

Azure Stream Analytics Patterns & Implementations

Thanks to the increased popularity of IoT and social networks, steaming analytics has become a hot topic and attracted more and more attentions in the data analytics community. Many people (e.g., this and this) believe streaming analytics is the future that will take over the use cases that are traditionally targeted by batch-oriented analytics.

Azure Stream Analytics is Microsoft’s offer of real-time analytics tool which is one major service in Azure Cortana Intelligence Suite. When designing data analytics solutions on Azure platform, we need to know what is the role Azure Stream Analytics can play in our solutions and how we can use Azure Stream Analytics in what use scenarios. Dr. Srinath Perera, an expert on CEP and streaming analytic, has summarised 13 patterns for streaming real-time analytics. Those patterns can be a very useful guide for us to make design decisions in our data analytics solutions.

In this blog post, I will discuss those patterns in Azure Stream Analytics context, evaluate Azure Stream Analytics’ strengths and weaknesses for those patterns,  and explore how to  implement those patterns using Azure Stream Analytics coupled with the supports from other Azure services (e.g., Event Hub, Azure Functions, and Azure Machine Learning).

Firstly, I am going to give a summary of Dr. Srinath Perera’s 13 streaming real-time analytics patterns and then discuss the Azure Stream Analytics implementation for each patterns. In addition, I am going to add an additional pattern, Edge analytics, onto the list, that is specific for Azure Stream Analytics.

Dr. Perera’s 13 stream analytics patterns

  • Pattern 1 – Preprocessing
  • Pattern 2 – Alerts and Thresholds
  • Pattern 3 – Simple Counting and Counting with Windows
  • Pattern 4 – Joining Event Streams
  • Pattern 5 – Data Correlation, Missing Events, and Erroneous Data
  • Pattern 6 – Interacting with Databases
  • Pattern 7 – Detecting Temporal Event Sequence Patterns
  • Pattern 8 – Tracking
  • Pattern 9 – Detecting Trends
  • Pattern 10 – Running the same Query in Batch and Realtime Pipelines
  • Pattern 11 – Detecting and switching to Detailed Analysis
  • Pattern 12 – Using a Model
  • Pattern 13 – Online Control
  • Pattern 14 (additional) – Edge Analytics

Pattern 1 – Preprocessing

One basic and common task for streaming analytics is data preprocessing that filters, reshapes, splits/combines and transforms incoming raw data into a format suitable for further processing and analysis.

Azure Stream Analytics provides a good support for data preprocessing tasks. The Stream Analytics Query Language is a sql-like language using a subset of T-SQL syntax. The developers with T-SQL skills can easily create scripts for those common data preprocessing task in Azure Stream Analytics with the SQL knowledge they already have. The Stream Analytics Query Language allows them to preprocess streaming data just in the same way as they preprocess batch-oriented data.

7*This snapshot is from Microsoft

Pattern 2 – Alerts and Thresholds

This pattern is a very common streaming analytics pattern, especially in many industrial IoT uses cases. In this pattern, the streaming analytics program detects the abnormal condition based on a pre-defined threshold and generates alerts based on the condition.

Anomaly detection using “WHERE” clause

We can use the “WHERE” clause of Stream Analytics Query Language in Azure Stream Analytics to detect the abnormal condition, and then output the queried event in the abnormal condition to a “Alert” output port,  e.g.,

SELECT  DeviceID, Temperature, "Over Temperature"  AS ErrorStatus
INTO AlertOutput
FROM TelemetryInput
WHERE Temperature >100

Anomaly detection using “ANOMALYDETECTION” Operator

The machine learning-based “ANOMALYDETECTION” Operator is a new feature recently added in Azure Stream Analytics and is currently under Preview release. This operator takes advantage of machine learning algorithm to detect events or observations that do not conform to the expected patterns.

The “ANOMALYDETECTION” Operator is very easy to use, similar to the way how LAG Operator is used.

3

you can find more details about the “ANOMALYDETECTION” Operator here.

Handling Alerts

When an abnormal condition is detected and output to the AlertOutput stream, We can handle the alert output in a number for ways on the Azure platform.

  1. Output the alert output into a live dashboard
  2. Send alert notifications
  3. Automatically handling the alert by adjust the setting of equipment

Azure Stream Analytics support the output of stream to real-time Power BI dashboard. With this feature we can show the real-time alerts on the Power BI dashboard monitored by the maintenance engineers.

6

The alert can also be send to the maintenance engineers in the push mode. Thanks to the recently added Azure Functions output target in Azure Stream Analytics, it is much easier for developers to send out the alerts through email or notifications without the need to first output the stream to service bus queues and then access Azure Functions from there. The developers can now directly egress the alert stream to Azure Functions where they can implement the logic for alert delivery.

1

When combined with Azure IoT Hub, we can also make the monitored equipment to automatically adjust settings based on the alerts. For example, Microsoft has created a real-time data processing solution for KingwayTek that takes advantage of Azure Stream Analytics, Azure Functions and Azure IoT Hub to proactively raise an alert on the vehicle status and the alert will trigger vehicle reconfiguration.

4

*This snapshot is from Microsoft

Pattern 3 – Simple Counting and Counting with Windows

In this pattern, the raw, atomic stream events will be aggregated in a time window to reveal the potential patterns and behaviours. For example, the raw message of a single website visit event may not provide us much meaningful insight but the average view counts per hour or per day can reveal the pattern of the website visits, e.g., the website has more visits in the evening than the morning.

To implement this pattern, the streaming analytics service need to support two types of functions, aggregation and time windows. Azure Stream Analytics provides good supports for both functions.

The Stream Analytics Query Language provides a list of built-in aggregate functions that can cover most of common aggregation requirements.

1.PNG

In addition, Azure Stream Analytics supports user-defined aggregates (UDA) written in Javascript that gives developers the extra power and flexibility to implement complicated aggregate rules.

Azure Stream Analytics also provide good supports on time windows. Three time window functions are supported by Azure Stream Analytics, including Tumbling window, Hopping window and Sliding window.

The tumbling window function, TumblingWindow,  segments a data stream into the repeated, non-overlap, and distinct time windows.

1t1*This image is from Microsoft

The hopping window function, HoppingWindow,  generates time windows that hops forward in time by a fixed period. Compared to the tumbling windows, the hopping windows can overlap with others so same events may fall in more than on hopping windows.

2*This image is from Microsoft

The sliding window function, SlidingWindow, generates time window when an event occurs. The time window ends at the time when the event happens and the start of the time window is defined by the period parameters specified in the SlidingWindow function.

stream-analytics-window-functions-sliding-intro*This image is from Microsoft

Pattern 4 – Joining Event Streams

This pattern is used for the scenarios where multiple data streams need to be processed to create a new event stream. For example, we may have multiple sensors that collect data for different aspects of an object or event.

Azure Streaming Analytics supports multiple inputs from a variety of stream data sources.

1t1.PNG*This image is from Microsoft

After the inputs are defined in Azure Streaming Analytics you can reference the inputs by name using Stream Analytics Query Language.

Pattern 5 – Data Correlation, Missing Events, and Erroneous Data

This pattern correlates the data from different streams or within the same stream. Dr. Perera has give some use cases of this pattern in his article, such as matching up two data streams that send events in different speeds, detecting a customer request that has not been responded within one hour, and detecting failed sensors by comparing a set of sensors that monitor overlapping regions.

In Azure Stream Analytics we can take advantage of the T-SQL syntax of the Stream Analytics Query Language to implement the pattern. For example, we can use Join clause to join different streams on the id of monitored object (e.g., the id of a machine where different sensors are installed on) and use the operators provided by T-SQL to find the correction.

Pattern 6 – Interacting with Databases

In many use cases the streaming data alone is not enough for us to dig out meaningful insight for the businesses. The data from the streaming source can only become useful when combined with historical, businesses oriented data. The streaming analytics service need to be able to fetch data from other business databases and combine with streaming data. For example, we need to check the blacklists when processing a real-time service request.

Azure Stream Analytics do provides the supports of reference data join in the Stream Analytics Query Language. To use this feature, we need to create a Reference type input that fetch the reference data from Azure Blob storage.1t1

Up until to the point, only Azure Blob storage is support as the reference data source for Azure Stream Analytics. We need to use Azure Data Factory to move the reference data from where they are originally stored into a Azure Blob storage instance. The reference data is modeled as a sequence of blobs in ascending order by the datatime specified in the blob name.

As most of reference data is slow changing type of data, the streaming analytics solutions also need to ensure the reference data they combined with the streaming data is up-to-date. Azure Stream Analytics do provides an approach to support slow changing reference data but has some limitations.

Firstly, the reference data blob stored in the Azure Blob storage cannot be updated as that would cause the Stream Analytics jobs to fail. Therefore, we can only add a new blob to store the updated reference data using the same container and path pattern defined in the job input with a date/time greater than the one specified in the last blob in the sequence. Secondly, the old reference data blobs must not be altered or removed.

Pattern 7 – Detecting Temporal Event Sequence Patterns

In this pattern, the streaming analytics is used to detect the temporal event sequence patterns. For example, a machine may fail to work after showing a sequence of status in a certain order. The streaming analytics solution need to be able to detect the sequence pattern so that an alert can be sent to engineers when the pattern occurs.

In the example provided by Dr. Perera, he used Storm and Siddhi (a CEP engien) to detect the temporal event sequence patterns. We can use the Stream Analytics Query Language in Azure Stream Analytics to implement the example. However, I think a better solution that can cope with more complicated use cases is to use machine learning algorithm to detect the pattern and make the prediction. Azure Stream Analytics provides good supports to the Azure Machine Learning. I will provide more details about the Azure Stream Analytics and Azure Machine Learning integration when discussing the Pattern 12.

Pattern 8 – Tracking

This pattern refers to the streaming analytics use cases on tracking something over space and time in one or more given conditions. Those use cases are often combined with IoT use cases that monitoring the real-time status or movements with something. For example, tracking the movement of missing airline luggage.

Azure Stream Analytics comes with real-time geospatial analytics capability that provides native functions for geospatial operations such as computing geospatial data as points, lines, polygons and also supports the join of multiple geospatial data streams to solve more complicated use cases.

Pattern 9 – Detecting Trends

This pattern detects the trend over time series data, e.g., usage increases and drops, peaks, outliers etc. Same as Pattern 8, this pattern is often used in the IoT use cases.

In Azure Stream Analytics, for simple use cases, we can use Stream Analytics Query Language to query the peak (MAX) value, outliers (ANOMALYDETECTION), and start value and end value in a time window for computing the trends . When combined with Power BI  dashboard, we can provide the time series based charts to visualise the trends.

For more complicated use cases, we may need to use some other functions outside of the Stream Analytics Query Language (e.g., is_monotonic_decreasing /is_monotonic_increasing in Python) or we may need time-series analysis model (e.g., ARIMA) for forecasting use cases.  At this moment, Azure Stream Analytics does not support Python or R. However, we can take a workaround that implements the algorithm in Azure Machine Learning studio with Python or R scripts and publish it as a rest api and then integrate it with the Azure Stream Analytics.

Pattern 10 – Running the same Query in Batch and Realtime Pipelines

I found the title of this Pattern “Running the same Query in Batch and Realtime Pipelines” is a bit of confusing, but from Dr. Perera’s explanation, this pattern refers to the Lambda Architecture which is the most popular data analytics architecture used in IoT use cases at this moment.

Lambda Architecture separates the IoT data analytics into two paths, hot path (in other name, speed layer) and cold path (batch layer). The hot path refers to the stream data processing path and the cold path refers to the batch-oriented data processing path. Microsoft Azure Cortana Intelligence suite provides good supports to the Lambda Architecture. More details can be found here.

1t1.PNG

*This snapshot is from Microsoft

Pattern 11 – Detecting and switching to Detailed Analysis

This pattern is used for the use cases where an anomaly or behaviour can be identified by the streaming analytics and further detailed analysis is required against the historical data. I think this pattern can be viewed as a sub-pattern of Pattern 10.

This pattern can be supported on Azure platform using Lambda Architecture as introduced above.

Pattern 12 – Using a Model

This pattern refers to use machine learning model in stream analytics. I have mentioned some use cases in previous patterns where machine learning model need to be used.

Azure Stream Analytics provide a Azure ML type function to support the integration with Azure Machine Learning.

1t1

The machine learning developers can implement the model using Azure Machine Learning studio and publish as a rest api. An Azure Stream Analytics job can call the api using the Azure ML function.

1t1.PNG

*This snapshot is from Microsoft

Pattern 13 – Online Control

This pattern refers to AI-related use cases such as autopilot, self-driving and robotics. Dr. Perera does not provide much details about this pattern in his article and presentation slides. I think Azure Stream Analytics is not designed for this type of application.

Pattern 14 (additional) – Edge Analytics

I have added this pattern to Dr. Perera’s list as Edge computing has become more and more important in IoT use cases and Azure Stream Analytics along with Azure Machine Learning are the main component in Microsoft’s Edge computing offer.

With Azure Stream Analytics on IoT Edge, the real-time analytics intelligence can be deployed close to IoT devices to achieve low latency, resiliency, efficient use of bandwidth and compliance.

2.PNG

*This snapshot is from Microsoft

 

Issues with Azure Streaming Analytics + Power BI Real-Time Streaming for IoT Hot-Path Analytics

Issues with Azure Streaming Analytics + Power BI Real-Time Streaming for IoT Hot-Path Analytics

Event Hub+Azure Streaming Analytics+Power BI Real-Time Streaming is the recommended approach from Microsoft for IoT hot-path analytics. The combination of those techniques provides a simple and efficient way to implement streaming analytics. However, I did meet some issues with this approach when designing hot-path analytics solutions for IoT projects.

  1. Azure Streaming Analytics does not support dynamic reference data join

Azure Streaming Analytics can only join static reference data stored in Azure Blob storage. The reference data file is load when an Azure Streaming Analytics job started. The update of the reference data is through an ETL process that periodically transformed and copied reference data to Azure Blob storage. That could cause big problems with some IoT projects that require frequently update of reference data. For example, for equipment hiring or industrial vehicle rent business that can charge customers based on the equipment or vehicle usage monitored by IoT devices, the equipment or vehicle can be transferred from customer to customer frequently. If the streaming analytics solution cannot pick up the change of customer reference data timely, the business cannot get accurate usage measure for each customer.

2. Limitations with Power BI real-time streaming

Azure Stream Analytics outputs data stream to Power BI stream dataset through Power BI REST APIs and allows report authors to build real-time dashboards. However, the limitations with Power BI stream dataset could prevent it to be adopted for many use cases. First, if Azure Stream Analytics produce rapid output to Power BI (Microsoft define the “rapid” as once or twice per second), the output will be batched into a single request that may cause the request size to exceed the streaming tile limit. Second, the default retentionPolicy set for Power BI is basicFIFO that supports 200,000 rows data size. When the 200,000 rows limit is reached, rows are dropped in the FIFO fashion (I found there is a Power BI Rest API endpoint for changing the setting to none, and I have given it a try but not work for me)

POST https://api.powerbi.com/v1.0/myorg/datasets?defaultRetentionPolicy={None| basicFIFO}

The limitations with data output rate and stream dataset size may not be a big problem for a PoC deployment with small number of IoT devices. However, in real-world IoT projects, thousands equipment/machines can be managed, and each of them may be equipped with tens of sensors. In this case, there will be only a dozen of rows for each sensor stored in the dataset that are far less enough for building most of types charts in Power BI.

3. Power BI dashboard does not support filtering

Dynamic streaming flow (animation) is the selling point of the Azure Streaming Analytics+Power BI real time streaming approach, and for many projects, it is the must-to-have. However, dynamic streaming flow can only be visualised on Power BI dashboards, and this feature is not supported for Power BI reports. However, the problem is that filtering is not support by Power BI dashboards. That means either you have to create one dashboards for each machine or you have to add all machines into one dashboard if you want to have the dynamic streaming flow visualisation. Obviously, it is not practical for most real-world IoT projects that need manage over one hundred machines.

4. Only a limited set of Power BI visuals supports streaming dataset

At this moment, only a very limited set of Power BI visuals (Card, Line chart, Clustered bar chart, Clustered column chart and Gauge) supports streaming dataset.

12.PNG