Azure Stream Analytics Patterns & Implementations

Thanks to the increased popularity of IoT and social networks, steaming analytics has become a hot topic and attracted more and more attentions in the data analytics community. Many people (e.g., this and this) believe streaming analytics is the future that will take over the use cases that are traditionally targeted by batch-oriented analytics.

Azure Stream Analytics is Microsoft’s offer of real-time analytics tool which is one major service in Azure Cortana Intelligence Suite. When designing data analytics solutions on Azure platform, we need to know what is the role Azure Stream Analytics can play in our solutions and how we can use Azure Stream Analytics in what use scenarios. Dr. Srinath Perera, an expert on CEP and streaming analytic, has summarised 13 patterns for streaming real-time analytics. Those patterns can be a very useful guide for us to make design decisions in our data analytics solutions.

In this blog post, I will discuss those patterns in Azure Stream Analytics context, evaluate Azure Stream Analytics’ strengths and weaknesses for those patterns,  and explore how to  implement those patterns using Azure Stream Analytics coupled with the supports from other Azure services (e.g., Event Hub, Azure Functions, and Azure Machine Learning).

Firstly, I am going to give a summary of Dr. Srinath Perera’s 13 streaming real-time analytics patterns and then discuss the Azure Stream Analytics implementation for each patterns. In addition, I am going to add an additional pattern, Edge analytics, onto the list, that is specific for Azure Stream Analytics.

Dr. Perera’s 13 stream analytics patterns

  • Pattern 1 – Preprocessing
  • Pattern 2 – Alerts and Thresholds
  • Pattern 3 – Simple Counting and Counting with Windows
  • Pattern 4 – Joining Event Streams
  • Pattern 5 – Data Correlation, Missing Events, and Erroneous Data
  • Pattern 6 – Interacting with Databases
  • Pattern 7 – Detecting Temporal Event Sequence Patterns
  • Pattern 8 – Tracking
  • Pattern 9 – Detecting Trends
  • Pattern 10 – Running the same Query in Batch and Realtime Pipelines
  • Pattern 11 – Detecting and switching to Detailed Analysis
  • Pattern 12 – Using a Model
  • Pattern 13 – Online Control
  • Pattern 14 (additional) – Edge Analytics

Pattern 1 – Preprocessing

One basic and common task for streaming analytics is data preprocessing that filters, reshapes, splits/combines and transforms incoming raw data into a format suitable for further processing and analysis.

Azure Stream Analytics provides a good support for data preprocessing tasks. The Stream Analytics Query Language is a sql-like language using a subset of T-SQL syntax. The developers with T-SQL skills can easily create scripts for those common data preprocessing task in Azure Stream Analytics with the SQL knowledge they already have. The Stream Analytics Query Language allows them to preprocess streaming data just in the same way as they preprocess batch-oriented data.

7*This snapshot is from Microsoft

Pattern 2 – Alerts and Thresholds

This pattern is a very common streaming analytics pattern, especially in many industrial IoT uses cases. In this pattern, the streaming analytics program detects the abnormal condition based on a pre-defined threshold and generates alerts based on the condition.

Anomaly detection using “WHERE” clause

We can use the “WHERE” clause of Stream Analytics Query Language in Azure Stream Analytics to detect the abnormal condition, and then output the queried event in the abnormal condition to a “Alert” output port,  e.g.,

SELECT  DeviceID, Temperature, "Over Temperature"  AS ErrorStatus
INTO AlertOutput
FROM TelemetryInput
WHERE Temperature >100

Anomaly detection using “ANOMALYDETECTION” Operator

The machine learning-based “ANOMALYDETECTION” Operator is a new feature recently added in Azure Stream Analytics and is currently under Preview release. This operator takes advantage of machine learning algorithm to detect events or observations that do not conform to the expected patterns.

The “ANOMALYDETECTION” Operator is very easy to use, similar to the way how LAG Operator is used.

3

you can find more details about the “ANOMALYDETECTION” Operator here.

Handling Alerts

When an abnormal condition is detected and output to the AlertOutput stream, We can handle the alert output in a number for ways on the Azure platform.

  1. Output the alert output into a live dashboard
  2. Send alert notifications
  3. Automatically handling the alert by adjust the setting of equipment

Azure Stream Analytics support the output of stream to real-time Power BI dashboard. With this feature we can show the real-time alerts on the Power BI dashboard monitored by the maintenance engineers.

6

The alert can also be send to the maintenance engineers in the push mode. Thanks to the recently added Azure Functions output target in Azure Stream Analytics, it is much easier for developers to send out the alerts through email or notifications without the need to first output the stream to service bus queues and then access Azure Functions from there. The developers can now directly egress the alert stream to Azure Functions where they can implement the logic for alert delivery.

1

When combined with Azure IoT Hub, we can also make the monitored equipment to automatically adjust settings based on the alerts. For example, Microsoft has created a real-time data processing solution for KingwayTek that takes advantage of Azure Stream Analytics, Azure Functions and Azure IoT Hub to proactively raise an alert on the vehicle status and the alert will trigger vehicle reconfiguration.

4

*This snapshot is from Microsoft

Pattern 3 – Simple Counting and Counting with Windows

In this pattern, the raw, atomic stream events will be aggregated in a time window to reveal the potential patterns and behaviours. For example, the raw message of a single website visit event may not provide us much meaningful insight but the average view counts per hour or per day can reveal the pattern of the website visits, e.g., the website has more visits in the evening than the morning.

To implement this pattern, the streaming analytics service need to support two types of functions, aggregation and time windows. Azure Stream Analytics provides good supports for both functions.

The Stream Analytics Query Language provides a list of built-in aggregate functions that can cover most of common aggregation requirements.

1.PNG

In addition, Azure Stream Analytics supports user-defined aggregates (UDA) written in Javascript that gives developers the extra power and flexibility to implement complicated aggregate rules.

Azure Stream Analytics also provide good supports on time windows. Three time window functions are supported by Azure Stream Analytics, including Tumbling window, Hopping window and Sliding window.

The tumbling window function, TumblingWindow,  segments a data stream into the repeated, non-overlap, and distinct time windows.

1t1*This image is from Microsoft

The hopping window function, HoppingWindow,  generates time windows that hops forward in time by a fixed period. Compared to the tumbling windows, the hopping windows can overlap with others so same events may fall in more than on hopping windows.

2*This image is from Microsoft

The sliding window function, SlidingWindow, generates time window when an event occurs. The time window ends at the time when the event happens and the start of the time window is defined by the period parameters specified in the SlidingWindow function.

stream-analytics-window-functions-sliding-intro*This image is from Microsoft

Pattern 4 – Joining Event Streams

This pattern is used for the scenarios where multiple data streams need to be processed to create a new event stream. For example, we may have multiple sensors that collect data for different aspects of an object or event.

Azure Streaming Analytics supports multiple inputs from a variety of stream data sources.

1t1.PNG*This image is from Microsoft

After the inputs are defined in Azure Streaming Analytics you can reference the inputs by name using Stream Analytics Query Language.

Pattern 5 – Data Correlation, Missing Events, and Erroneous Data

This pattern correlates the data from different streams or within the same stream. Dr. Perera has give some use cases of this pattern in his article, such as matching up two data streams that send events in different speeds, detecting a customer request that has not been responded within one hour, and detecting failed sensors by comparing a set of sensors that monitor overlapping regions.

In Azure Stream Analytics we can take advantage of the T-SQL syntax of the Stream Analytics Query Language to implement the pattern. For example, we can use Join clause to join different streams on the id of monitored object (e.g., the id of a machine where different sensors are installed on) and use the operators provided by T-SQL to find the correction.

Pattern 6 – Interacting with Databases

In many use cases the streaming data alone is not enough for us to dig out meaningful insight for the businesses. The data from the streaming source can only become useful when combined with historical, businesses oriented data. The streaming analytics service need to be able to fetch data from other business databases and combine with streaming data. For example, we need to check the blacklists when processing a real-time service request.

Azure Stream Analytics do provides the supports of reference data join in the Stream Analytics Query Language. To use this feature, we need to create a Reference type input that fetch the reference data from Azure Blob storage.1t1

Up until to the point, only Azure Blob storage is support as the reference data source for Azure Stream Analytics. We need to use Azure Data Factory to move the reference data from where they are originally stored into a Azure Blob storage instance. The reference data is modeled as a sequence of blobs in ascending order by the datatime specified in the blob name.

As most of reference data is slow changing type of data, the streaming analytics solutions also need to ensure the reference data they combined with the streaming data is up-to-date. Azure Stream Analytics do provides an approach to support slow changing reference data but has some limitations.

Firstly, the reference data blob stored in the Azure Blob storage cannot be updated as that would cause the Stream Analytics jobs to fail. Therefore, we can only add a new blob to store the updated reference data using the same container and path pattern defined in the job input with a date/time greater than the one specified in the last blob in the sequence. Secondly, the old reference data blobs must not be altered or removed.

Pattern 7 – Detecting Temporal Event Sequence Patterns

In this pattern, the streaming analytics is used to detect the temporal event sequence patterns. For example, a machine may fail to work after showing a sequence of status in a certain order. The streaming analytics solution need to be able to detect the sequence pattern so that an alert can be sent to engineers when the pattern occurs.

In the example provided by Dr. Perera, he used Storm and Siddhi (a CEP engien) to detect the temporal event sequence patterns. We can use the Stream Analytics Query Language in Azure Stream Analytics to implement the example. However, I think a better solution that can cope with more complicated use cases is to use machine learning algorithm to detect the pattern and make the prediction. Azure Stream Analytics provides good supports to the Azure Machine Learning. I will provide more details about the Azure Stream Analytics and Azure Machine Learning integration when discussing the Pattern 12.

Pattern 8 – Tracking

This pattern refers to the streaming analytics use cases on tracking something over space and time in one or more given conditions. Those use cases are often combined with IoT use cases that monitoring the real-time status or movements with something. For example, tracking the movement of missing airline luggage.

Azure Stream Analytics comes with real-time geospatial analytics capability that provides native functions for geospatial operations such as computing geospatial data as points, lines, polygons and also supports the join of multiple geospatial data streams to solve more complicated use cases.

Pattern 9 – Detecting Trends

This pattern detects the trend over time series data, e.g., usage increases and drops, peaks, outliers etc. Same as Pattern 8, this pattern is often used in the IoT use cases.

In Azure Stream Analytics, for simple use cases, we can use Stream Analytics Query Language to query the peak (MAX) value, outliers (ANOMALYDETECTION), and start value and end value in a time window for computing the trends . When combined with Power BI  dashboard, we can provide the time series based charts to visualise the trends.

For more complicated use cases, we may need to use some other functions outside of the Stream Analytics Query Language (e.g., is_monotonic_decreasing /is_monotonic_increasing in Python) or we may need time-series analysis model (e.g., ARIMA) for forecasting use cases.  At this moment, Azure Stream Analytics does not support Python or R. However, we can take a workaround that implements the algorithm in Azure Machine Learning studio with Python or R scripts and publish it as a rest api and then integrate it with the Azure Stream Analytics.

Pattern 10 – Running the same Query in Batch and Realtime Pipelines

I found the title of this Pattern “Running the same Query in Batch and Realtime Pipelines” is a bit of confusing, but from Dr. Perera’s explanation, this pattern refers to the Lambda Architecture which is the most popular data analytics architecture used in IoT use cases at this moment.

Lambda Architecture separates the IoT data analytics into two paths, hot path (in other name, speed layer) and cold path (batch layer). The hot path refers to the stream data processing path and the cold path refers to the batch-oriented data processing path. Microsoft Azure Cortana Intelligence suite provides good supports to the Lambda Architecture. More details can be found here.

1t1.PNG

*This snapshot is from Microsoft

Pattern 11 – Detecting and switching to Detailed Analysis

This pattern is used for the use cases where an anomaly or behaviour can be identified by the streaming analytics and further detailed analysis is required against the historical data. I think this pattern can be viewed as a sub-pattern of Pattern 10.

This pattern can be supported on Azure platform using Lambda Architecture as introduced above.

Pattern 12 – Using a Model

This pattern refers to use machine learning model in stream analytics. I have mentioned some use cases in previous patterns where machine learning model need to be used.

Azure Stream Analytics provide a Azure ML type function to support the integration with Azure Machine Learning.

1t1

The machine learning developers can implement the model using Azure Machine Learning studio and publish as a rest api. An Azure Stream Analytics job can call the api using the Azure ML function.

1t1.PNG

*This snapshot is from Microsoft

Pattern 13 – Online Control

This pattern refers to AI-related use cases such as autopilot, self-driving and robotics. Dr. Perera does not provide much details about this pattern in his article and presentation slides. I think Azure Stream Analytics is not designed for this type of application.

Pattern 14 (additional) – Edge Analytics

I have added this pattern to Dr. Perera’s list as Edge computing has become more and more important in IoT use cases and Azure Stream Analytics along with Azure Machine Learning are the main component in Microsoft’s Edge computing offer.

With Azure Stream Analytics on IoT Edge, the real-time analytics intelligence can be deployed close to IoT devices to achieve low latency, resiliency, efficient use of bandwidth and compliance.

2.PNG

*This snapshot is from Microsoft

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s