Azure Stream Analytics Patterns & Implementations

Thanks to the increased popularity of IoT and social networks, steaming analytics has become a hot topic and attracted more and more attentions in the data analytics community. Many people (e.g., this and this) believe streaming analytics is the future that will take over the use cases that are traditionally targeted by batch-oriented analytics.

Azure Stream Analytics is Microsoft’s offer of real-time analytics tool which is one major service in Azure Cortana Intelligence Suite. When designing data analytics solutions on Azure platform, we need to know what is the role Azure Stream Analytics can play in our solutions and how we can use Azure Stream Analytics in what use scenarios. Dr. Srinath Perera, an expert on CEP and streaming analytic, has summarised 13 patterns for streaming real-time analytics. Those patterns can be a very useful guide for us to make design decisions in our data analytics solutions.

In this blog post, I will discuss those patterns in Azure Stream Analytics context, evaluate Azure Stream Analytics’ strengths and weaknesses for those patterns,  and explore how to  implement those patterns using Azure Stream Analytics coupled with the supports from other Azure services (e.g., Event Hub, Azure Functions, and Azure Machine Learning).

Firstly, I am going to give a summary of Dr. Srinath Perera’s 13 streaming real-time analytics patterns and then discuss the Azure Stream Analytics implementation for each patterns. In addition, I am going to add an additional pattern, Edge analytics, onto the list, that is specific for Azure Stream Analytics.

Dr. Perera’s 13 stream analytics patterns

  • Pattern 1 – Preprocessing
  • Pattern 2 – Alerts and Thresholds
  • Pattern 3 – Simple Counting and Counting with Windows
  • Pattern 4 – Joining Event Streams
  • Pattern 5 – Data Correlation, Missing Events, and Erroneous Data
  • Pattern 6 – Interacting with Databases
  • Pattern 7 – Detecting Temporal Event Sequence Patterns
  • Pattern 8 – Tracking
  • Pattern 9 – Detecting Trends
  • Pattern 10 – Running the same Query in Batch and Realtime Pipelines
  • Pattern 11 – Detecting and switching to Detailed Analysis
  • Pattern 12 – Using a Model
  • Pattern 13 – Online Control
  • Pattern 14 (additional) – Edge Analytics

Pattern 1 – Preprocessing

One basic and common task for streaming analytics is data preprocessing that filters, reshapes, splits/combines and transforms incoming raw data into a format suitable for further processing and analysis.

Azure Stream Analytics provides a good support for data preprocessing tasks. The Stream Analytics Query Language is a sql-like language using a subset of T-SQL syntax. The developers with T-SQL skills can easily create scripts for those common data preprocessing task in Azure Stream Analytics with the SQL knowledge they already have. The Stream Analytics Query Language allows them to preprocess streaming data just in the same way as they preprocess batch-oriented data.

7*This snapshot is from Microsoft

Pattern 2 – Alerts and Thresholds

This pattern is a very common streaming analytics pattern, especially in many industrial IoT uses cases. In this pattern, the streaming analytics program detects the abnormal condition based on a pre-defined threshold and generates alerts based on the condition.

Anomaly detection using “WHERE” clause

We can use the “WHERE” clause of Stream Analytics Query Language in Azure Stream Analytics to detect the abnormal condition, and then output the queried event in the abnormal condition to a “Alert” output port,  e.g.,

SELECT  DeviceID, Temperature, "Over Temperature"  AS ErrorStatus
INTO AlertOutput
FROM TelemetryInput
WHERE Temperature >100

Anomaly detection using “ANOMALYDETECTION” Operator

The machine learning-based “ANOMALYDETECTION” Operator is a new feature recently added in Azure Stream Analytics and is currently under Preview release. This operator takes advantage of machine learning algorithm to detect events or observations that do not conform to the expected patterns.

The “ANOMALYDETECTION” Operator is very easy to use, similar to the way how LAG Operator is used.

3

you can find more details about the “ANOMALYDETECTION” Operator here.

Handling Alerts

When an abnormal condition is detected and output to the AlertOutput stream, We can handle the alert output in a number for ways on the Azure platform.

  1. Output the alert output into a live dashboard
  2. Send alert notifications
  3. Automatically handling the alert by adjust the setting of equipment

Azure Stream Analytics support the output of stream to real-time Power BI dashboard. With this feature we can show the real-time alerts on the Power BI dashboard monitored by the maintenance engineers.

6

The alert can also be send to the maintenance engineers in the push mode. Thanks to the recently added Azure Functions output target in Azure Stream Analytics, it is much easier for developers to send out the alerts through email or notifications without the need to first output the stream to service bus queues and then access Azure Functions from there. The developers can now directly egress the alert stream to Azure Functions where they can implement the logic for alert delivery.

1

When combined with Azure IoT Hub, we can also make the monitored equipment to automatically adjust settings based on the alerts. For example, Microsoft has created a real-time data processing solution for KingwayTek that takes advantage of Azure Stream Analytics, Azure Functions and Azure IoT Hub to proactively raise an alert on the vehicle status and the alert will trigger vehicle reconfiguration.

4

*This snapshot is from Microsoft

Pattern 3 – Simple Counting and Counting with Windows

In this pattern, the raw, atomic stream events will be aggregated in a time window to reveal the potential patterns and behaviours. For example, the raw message of a single website visit event may not provide us much meaningful insight but the average view counts per hour or per day can reveal the pattern of the website visits, e.g., the website has more visits in the evening than the morning.

To implement this pattern, the streaming analytics service need to support two types of functions, aggregation and time windows. Azure Stream Analytics provides good supports for both functions.

The Stream Analytics Query Language provides a list of built-in aggregate functions that can cover most of common aggregation requirements.

1.PNG

In addition, Azure Stream Analytics supports user-defined aggregates (UDA) written in Javascript that gives developers the extra power and flexibility to implement complicated aggregate rules.

Azure Stream Analytics also provide good supports on time windows. Three time window functions are supported by Azure Stream Analytics, including Tumbling window, Hopping window and Sliding window.

The tumbling window function, TumblingWindow,  segments a data stream into the repeated, non-overlap, and distinct time windows.

1t1*This image is from Microsoft

The hopping window function, HoppingWindow,  generates time windows that hops forward in time by a fixed period. Compared to the tumbling windows, the hopping windows can overlap with others so same events may fall in more than on hopping windows.

2*This image is from Microsoft

The sliding window function, SlidingWindow, generates time window when an event occurs. The time window ends at the time when the event happens and the start of the time window is defined by the period parameters specified in the SlidingWindow function.

stream-analytics-window-functions-sliding-intro*This image is from Microsoft

Pattern 4 – Joining Event Streams

This pattern is used for the scenarios where multiple data streams need to be processed to create a new event stream. For example, we may have multiple sensors that collect data for different aspects of an object or event.

Azure Streaming Analytics supports multiple inputs from a variety of stream data sources.

1t1.PNG*This image is from Microsoft

After the inputs are defined in Azure Streaming Analytics you can reference the inputs by name using Stream Analytics Query Language.

Pattern 5 – Data Correlation, Missing Events, and Erroneous Data

This pattern correlates the data from different streams or within the same stream. Dr. Perera has give some use cases of this pattern in his article, such as matching up two data streams that send events in different speeds, detecting a customer request that has not been responded within one hour, and detecting failed sensors by comparing a set of sensors that monitor overlapping regions.

In Azure Stream Analytics we can take advantage of the T-SQL syntax of the Stream Analytics Query Language to implement the pattern. For example, we can use Join clause to join different streams on the id of monitored object (e.g., the id of a machine where different sensors are installed on) and use the operators provided by T-SQL to find the correction.

Pattern 6 – Interacting with Databases

In many use cases the streaming data alone is not enough for us to dig out meaningful insight for the businesses. The data from the streaming source can only become useful when combined with historical, businesses oriented data. The streaming analytics service need to be able to fetch data from other business databases and combine with streaming data. For example, we need to check the blacklists when processing a real-time service request.

Azure Stream Analytics do provides the supports of reference data join in the Stream Analytics Query Language. To use this feature, we need to create a Reference type input that fetch the reference data from Azure Blob storage.1t1

Up until to the point, only Azure Blob storage is support as the reference data source for Azure Stream Analytics. We need to use Azure Data Factory to move the reference data from where they are originally stored into a Azure Blob storage instance. The reference data is modeled as a sequence of blobs in ascending order by the datatime specified in the blob name.

As most of reference data is slow changing type of data, the streaming analytics solutions also need to ensure the reference data they combined with the streaming data is up-to-date. Azure Stream Analytics do provides an approach to support slow changing reference data but has some limitations.

Firstly, the reference data blob stored in the Azure Blob storage cannot be updated as that would cause the Stream Analytics jobs to fail. Therefore, we can only add a new blob to store the updated reference data using the same container and path pattern defined in the job input with a date/time greater than the one specified in the last blob in the sequence. Secondly, the old reference data blobs must not be altered or removed.

Pattern 7 – Detecting Temporal Event Sequence Patterns

In this pattern, the streaming analytics is used to detect the temporal event sequence patterns. For example, a machine may fail to work after showing a sequence of status in a certain order. The streaming analytics solution need to be able to detect the sequence pattern so that an alert can be sent to engineers when the pattern occurs.

In the example provided by Dr. Perera, he used Storm and Siddhi (a CEP engien) to detect the temporal event sequence patterns. We can use the Stream Analytics Query Language in Azure Stream Analytics to implement the example. However, I think a better solution that can cope with more complicated use cases is to use machine learning algorithm to detect the pattern and make the prediction. Azure Stream Analytics provides good supports to the Azure Machine Learning. I will provide more details about the Azure Stream Analytics and Azure Machine Learning integration when discussing the Pattern 12.

Pattern 8 – Tracking

This pattern refers to the streaming analytics use cases on tracking something over space and time in one or more given conditions. Those use cases are often combined with IoT use cases that monitoring the real-time status or movements with something. For example, tracking the movement of missing airline luggage.

Azure Stream Analytics comes with real-time geospatial analytics capability that provides native functions for geospatial operations such as computing geospatial data as points, lines, polygons and also supports the join of multiple geospatial data streams to solve more complicated use cases.

Pattern 9 – Detecting Trends

This pattern detects the trend over time series data, e.g., usage increases and drops, peaks, outliers etc. Same as Pattern 8, this pattern is often used in the IoT use cases.

In Azure Stream Analytics, for simple use cases, we can use Stream Analytics Query Language to query the peak (MAX) value, outliers (ANOMALYDETECTION), and start value and end value in a time window for computing the trends . When combined with Power BI  dashboard, we can provide the time series based charts to visualise the trends.

For more complicated use cases, we may need to use some other functions outside of the Stream Analytics Query Language (e.g., is_monotonic_decreasing /is_monotonic_increasing in Python) or we may need time-series analysis model (e.g., ARIMA) for forecasting use cases.  At this moment, Azure Stream Analytics does not support Python or R. However, we can take a workaround that implements the algorithm in Azure Machine Learning studio with Python or R scripts and publish it as a rest api and then integrate it with the Azure Stream Analytics.

Pattern 10 – Running the same Query in Batch and Realtime Pipelines

I found the title of this Pattern “Running the same Query in Batch and Realtime Pipelines” is a bit of confusing, but from Dr. Perera’s explanation, this pattern refers to the Lambda Architecture which is the most popular data analytics architecture used in IoT use cases at this moment.

Lambda Architecture separates the IoT data analytics into two paths, hot path (in other name, speed layer) and cold path (batch layer). The hot path refers to the stream data processing path and the cold path refers to the batch-oriented data processing path. Microsoft Azure Cortana Intelligence suite provides good supports to the Lambda Architecture. More details can be found here.

1t1.PNG

*This snapshot is from Microsoft

Pattern 11 – Detecting and switching to Detailed Analysis

This pattern is used for the use cases where an anomaly or behaviour can be identified by the streaming analytics and further detailed analysis is required against the historical data. I think this pattern can be viewed as a sub-pattern of Pattern 10.

This pattern can be supported on Azure platform using Lambda Architecture as introduced above.

Pattern 12 – Using a Model

This pattern refers to use machine learning model in stream analytics. I have mentioned some use cases in previous patterns where machine learning model need to be used.

Azure Stream Analytics provide a Azure ML type function to support the integration with Azure Machine Learning.

1t1

The machine learning developers can implement the model using Azure Machine Learning studio and publish as a rest api. An Azure Stream Analytics job can call the api using the Azure ML function.

1t1.PNG

*This snapshot is from Microsoft

Pattern 13 – Online Control

This pattern refers to AI-related use cases such as autopilot, self-driving and robotics. Dr. Perera does not provide much details about this pattern in his article and presentation slides. I think Azure Stream Analytics is not designed for this type of application.

Pattern 14 (additional) – Edge Analytics

I have added this pattern to Dr. Perera’s list as Edge computing has become more and more important in IoT use cases and Azure Stream Analytics along with Azure Machine Learning are the main component in Microsoft’s Edge computing offer.

With Azure Stream Analytics on IoT Edge, the real-time analytics intelligence can be deployed close to IoT devices to achieve low latency, resiliency, efficient use of bandwidth and compliance.

2.PNG

*This snapshot is from Microsoft

 

Advertisements

SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2

*The source code created for this blog post can be found here.

In the previous blog posts in the SSIS in Azure series, we created a SSIS package to periodically ingests data from Azure SQL database to Azure Data Lake Store and deployed the package in the Azure-SSIS Integrated Runtime. Up to this point, we have achieved two goals in the SSIS in Azure series:

  • Using SSIS to move data between cloud storages
  • Host and run SSIS packages in cloud

The last goal we need to achieve is to schedule the execution of the SSIS package in cloud. Traditionally, we can schedule the SSIS package execution by creating a SQL Server Agent job. However, we can only do that through an on-premises SQL Server instance or provision a SQL Server vm in Azure. Thanks to the support of Stored Procedure activity in ADF, we can now schedule the SSIS package execution using a cloud-based ADF pipeline. This blog post will walk through the steps to achieve that.

You can find the source code created for this blog post in my Github. The ningago.demo.adf.ssis.adfv2 project contains the json and powershell files to create the ADF pipeline.

4.PNG

 

The key to this solution is to call the sp_executesql stored procedure in the RunSSISPackagePipeline pipeline that execute the sql script for triggering the SSIS package execution. We can define a Stored Procedure activity in the pipeline to call the sp_executesql stored procedure and pass in the sql script for triggering the SSIS package execution as parameter.

5

We don’t have to manually author the sql script from scratch, but instead we can generate the script using SSMS. First, we connect to the SSISDB catalog in our Azure-SSIS integrated runtime (please refer to SSIS in Azure #2 for how to do that), select the SSIS project or package we can execute, and click the Execute button to open the Execute Package dialog window.

6

On the Execute Package dialog window, we select the package to execute and also set a value (can be any value) for the package parameter (the reason is set any value to the parameter is to ensure the parameter setting statement will be generated in the sql script).

2

We then click the “Script” button on the top-left that will generate the sql script for executing the selected SSIS package.

3

We can then copy the script into the stored procedure activity defined in the RunSSISPackagePipeline pipeline.

5

As you may have noticed that the SSIS package parameter, DateScliceToLoad, has been manually set by us with a random value. The next step we need to take is to pass the date of pipeline run into the sql script in order to make the SSIS package only move the data in the give day. Firstly, we need to define the “scheduledRunTime” parameter in the RunSSISPipeline which will receive the scheduled pipeline run time from the pipeline trigger we will create later.

6.PNG

In the sql script we just generated to trigger SSIS package execution, we replace the random date we have manually set with the ADF expression which points to the scheduleRunTime parameter in the pipeline.

7.PNG

Next, we will need to create the trigger json file to schedule the execution of the pipeline. After we specify the pipeline to execute in the trigger file, we need to pass the scheduledTime of the trigger which is a system variable to the scheduledRunTime parameter defined in the pipeline.

{
  "properties": {
    "name": "RunSSISPackageScheduler",
    "type": "ScheduleTrigger",
    "typeProperties": {
      "recurrence": {
        "frequency": "Hour",
        "interval": 1,
        "startTime": "2018-01-07T00:00:00-22:00"
      }
    },
    "pipelines": [
      {
        "pipelineReference": {
          "type": "PipelineReference",
          "referenceName": "RunSSISPackagePipeline"
        },
        "parameters": {
          "scheduledRunTime": "@trigger().scheduledTime"
        }
      }
    ]
  }
}

After we have created the json files to define linked service, pipeline and the trigger, we need to deploy them into our data factory instance. As the V2 version of ADF does not support deployment through UI yet, we need to create a deployment scripts using PowerShell.

$DataFactoryName = "ninjago3843adf"
$ResourceGroupName = "SSIS"

Set-AzureRmDataFactoryV2LinkedService -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName -Name "SSISDBLinkedService" -File ".\SSISDBLinkedService.json"
Set-AzureRmDataFactoryV2Pipeline -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName -Name "RunSSISPackagePipeline" -DefinitionFile ".\RunSSISPackagePipeline.json"

Stop-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler"
Set-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler" -DefinitionFile ".\RunSSISPackageScheduler.json"
Start-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler"

After the pipeline is deployed and scheduled by the trigger, the pipeline will execute our SSIS package based on the schedule.

8.PNG

 

SSIS in Azure #2 – Deploy SSIS Packages to Azure-SSIS Integration Runtime in ADF V2

In the first blog post of the SSIS in Azure series, I gave a demonstration on how to create SSIS packages to move data in cloud, using a common use case that periodically ingests data from Azure SQL database to Azure Data Lake Store.  In the pre-ADF V2 era, we can only deploy SSIS packages in on-premises SQL Servers or SQL Servers in Azure VM. Thanks to the Azure-SSIS integration runtime that is available for public preview in Azure Data Factory V2, we can now directly deploy and execute our SSIS packages in Azure without needing to provision and manage an Azure SQL Server VM (One thing to note, under the hood, Azure-SSIS integrated runtime itself is a cluster of Azure VMs that are dedicated to run SSIS packages).

This blog post will walk through the steps to deploy, execute and monitor the SSIS package we have created in previous blog post in the Azure-SSIS integration runtime.

First of all, we need to have an Azure-SSIS integration runtime created in our Azure tenant. This can be achieved using the PowerShell scripts provided by Microsoft.

I have met an issue when running the scripts. Here is the hint in case you run into the same issue. If the Set-AzureRmDataFactoryV2 and the Set-AzureRmDataFactoryV2IntegrationRuntime cmdlet in the PowerShell scripts fail to execute and raise the “HTTP Status Code: NotFound” error even after you have installed the Azure PowerShell modules (we can see the AzureRM.DataFactoryV2 module is installed after checking the C:\Program Files\WindowsPowerShell\Modules folder), it may be caused by the incompatible versions between AzureRM modules. As the “Install-Module AzureRM -AllowClobber” cmdlet only installs the Azure modules that are not already installed in your computer, the Azure modules on which the AzureRM.DataFactoryV2 module is dependent are not updated. The simplest solution to this issue is to remove all AzureRM modules in the C:\Program Files\WindowsPowerShell\Modules folder and then run the Install-Module AzureRM -AllowClobber cmdlet.

After the PowerShell scripts is successfully run, an Azure Data Factory V2 instance and a SSISDB Azure SQL database are created in the resource group we specified in the scripts.

2

The SSISDB is the Azure-hosted SSIS catalog where you can deploy your SSIS packaged into. We can use SSMS to connect to the SSISDB. On the Connect to Server dialogue in the SSMS, we need to specify the Server name as the Azure SQL server where the SSISDB SQL database is hosted.

3

After we specified the server name and the login details, don’t click the Connect button to connect to the SQL Server, but instead click the Options button to open the Connection Properties tab. On the Connection Properties tab, set “Connect to database” field as SSISDB database. This settings will allow you to connect to the SSISDB SSIS catalog after clicked the Connect button.

4

5

We can deploy our SSIS package into the SSISdB catalog either by using Deploy Project option in SSMS

6

or using Deploy option in you SSIS SSDT project in Visual Studio.

7

Both options will launch the Integration Services Deployment Wizard dialog to guide us through the steps for the SSIS package deployment.

8

13

We can execute the deployed SSIS packages in SSMS by right-clicking a SSIS project or SSIS package to launch the Execute Package dialog. The SSIS package we have created to periodically load data from Azure SQL database to Azure Data Lake requires a DateSclieToLoad parameter passed in that specifies which day of data to move. We can manually set the value of this parameter in the Execute Package dialog.

9

We can also monitor the SSIS package executions in SSMS through the SSIS catalog reports.

14.PNG

1112

Up to this point, we have created the Azure-SSIS Integration Runtime and deployed our SSIS package into the runtime. We can execute the package and monitor the execution results manually in SSMS. In next blog post, I will walk through the steps to schedule the SSIS package execution using an ADF V2 pipeline.  The ADF V2 pipeline will be scheduled to execute every day. The date of the current pipeline execution will be passed into the SSIS package, and the SSIS package will move the data by the given date parameter.

 

 

Power BI – GMROI Measure

GMROI (Gross Margin Return On Investment) is one of the most popular metrics, commonly used in retail industry, for inventory management. Retailers are short-term investors and the “buy and hold” strategy does not work in retail industry. Instead of having the cash frozen in inventory, it is crucial to keep cash flowing to continually purchase, mark up, and sell in order to generate profits and expand the business. GMROI is a profitability metric that helps a retailer to analyse how efficiently the inventories are being converted into cash.

GMROI is defined as the gross profit a retailer makes in return for their investment in inventory.  A common formula used to calculate GMROI is to divide the gross profit by the average inventory cost.

GMROI = Gross Profit / Average Inventory Cost

Gross Profit is calculated by subtracting the COGS (Cost of Goods Sold) from the revenue:

Gross Margin = Revenue – COGS

Inventory cost is a semiadditive measure as it is not additive on the date dimension. Average Inventory Cost is calculated by dividing the sum of the inventory cost over a specified period by the total number of days of the period:

Average Inventory Cost = Sum of Inventory Cost over a Period/Number of Days of the Period 

Due to the semiadditive nature of the inventory measures, the GMROI analysis in a BI solution is normally conducted on a periodic snapshot data model, including an inventory fact table, a date dimension table, and a number of other dimension tables that are applicable to the analysis, such as vendors, stores, and products.

The snapshot below shows a sample data schema for GMROI analysis, created in Power BI. The design of the data schema can be variant, depending on the retailers’ specific business rules and LOB database design. You can download the demo pbix file here.

1

The Inventory table in the sample data schema records the daily snapshot of the inventory level,  the quantity of sold products, inventory cost and retail  price of the products over the vendor, store and product dimensions.

To create the Gross Profit measure, we can calculate the gross profit for each sold item and multiply it by the quantity of items sold in a day. We can then use the SUMX function to roll-up the total gross profit, depending on the evaluation context.

Gross Profit = SUMX('Inventory',
                    'Inventory'[Quantity Sold]*
                       ('Inventory'[Retail Price] - 'Inventory'[Cost]) 
                )

To create the Average Inventory Cost measure, we can sum up the inventory cost of all days in a period and divide it by the number of days in the period.

Average Inventory Cost = 
    DIVIDE(
        SUMX('Inventory',
             'Inventory'[Inventory Level]*'Inventory'[Cost]),
             COUNTROWS('Inventory')
        )

After the Gross Profit measure and the Average Inventory Cost measure are created, we can simply calculate the GMROI measure by dividing the Gross Profit measure by the Average Inventory Cost Measure.

GMROI = DIVIDE([Gross Profit], [Average Inventory Cost])

The GMROI measure can be used in different evaluation context depending on the specific requirements of the GMROI analysis. For example, we can use the measure to calculate the annual GMROI of the products from Vendor A and sold at Store X.

DAX – Find the Items Ranked in Top n for Multiple Periods (with Dynamic Slicing)

One of my previous blog post introduces how to find the items which are ranked in top n for multiple periods, using the INTERSET and TOPN functions. However, that approach needs to hard-code the periods and the number of top items in the DAX scripts. This blog post introduces an approach that allows users to dynamically specify the periods and the number of top items to evaluate, using the interactive dashboard slicers.

a1

In this blog post, we will still use the Eurovision dataset as example that contains the rows of country-to-country votes for each year.

1

We will create four measures, including “Rank”, “In Top N (This Year)”, “In Top N (All Selected Years)”, and “All Selected Years in Top N”. These measures will be used in an evaluation context made of the combination of each year and each country. To build the evaluation context, we can use a Power BI table visual and add the “Year” and “ToCountry” columns from the Eurovision dataset to the table. The four measures will be added to the table later that evaluates the rank and whether in top n of each country in each year.

a2

A “Year” slicer will be added to the dashboard that allows users to filter the table by the selected years. Any number of years can be selected and the selected years can be consecutive or nonconsecutive.

a4

Measure – Rank

The first measure to create is the “Rank” measure that computes the ranks of the countries in each selected year.

Rank = RANKX(ALL(data[Country]), CALCULATE(SUM(data[Points])))

Measure – In Top N (This Year)

Based on the “Rank” measure, we will create the “In Top N (This Year)” measure that compute whether the current country is ranked in top n in the current year-country evaluation context. Here we need to allow users to dynamically specify the N (the number of top items) to evaluate. We can achieve that using a disconnected parameter table that defines the options for the N.

a3

In the DAX measure, we can get the user selected N value using VALUES function which will be compared to the “Rank” measure we created earlier to evaluate whether the current country is in top N in current year context.

In Top N (This Year) = 
    IF([Rank]<
        IF(HASONEVALUE('TopN'[Top N ]),
            VALUES('TopN'[Top N ]),
            10
        ), 1, 0)

We will then filter the table using the “In Top N (This Year)” measure that only keeps the countries ranked in the top N in at least one of the selected years.

a5.PNG

a7

Measure – In Top N (All Selected Years)

After we applied the filter on the “In Top N (This Year)” measure, the table only contains the rows of countries ranked in top N in at least one selected years. If we count the rows in the filtered table by a country, we will  get the number of selected years when this country is ranked in top N. This is what the “In Top N (All Selected Years)” measure will do.

In Top N (All Selected Years) = 
   CALCULATE(
        DISTINCTCOUNT(data[Year]),
        ALLSELECTED(data[Year])
    )

a8

Measure – All Selected Years in Top N

Now that we have the “In Top N (All Selected Years)” measure which tells us how many of the select years  a country is ranked in top 10, we can then calculate the total number of the select years and compare it to the “In Top N (All Selected Years)” measure. If the value of the “In Top N (All Selected Years)” measure is equal to the total number of selected years, that means the country is  ranked in top 10 in all the selected years.

All Selected Years in Top N = 
    VAR NumberOfSelectedYears = 
        CALCULATE(
            DISTINCTCOUNT(data[Year]),
            ALLSELECTED(data[Year]),
            ALLSELECTED(data[Country])
        )
    RETURN
        [In Top N (All Selected Years)] = NumberOfSelectedYears

a9.PNG

Please find the pbix file here.

R Visual – Create Gartner Magic Quadrant-Like Charts in Power BI using ggplot2

In this blog post, I am going to create a R visual that renders the Gartner magic quadrant-like charts in Power BI using the ggplot2 package.

2.PNG

A dummy dataset will be created, including three columns, the “Company” column holding the name of the companies which will be ranked in the quadrant chart, the “ExcutionScore” column and the “VisionScore” column corresponding to the “Ability to Execute” metric and the “Completeness of Vision” metric in the Gartner magic quadrant assessment. In the dummy dataset, the “ExcutionScore” and the “VisionScore” are scale from 0 to 100.

3

We drag a R visual onto Power BI editor canvas and add the three columns from the dummy dataset. We can bind the RStudio IDE to Power BI and use it to author and test the R scripts.

In the R script editor, we first reference the “ggplot2” library and the “grid” library. The “grid” library is used to draw custom annotations outside of the main ggplot2 panel.

library(ggplot2)
library(grid)

We then create a ggplot2 object using the dataset referenced in the R visual, assigning the “VisionScore” value to x-axis and assigning the “ExcutionScore” value to y-axis.

p <- ggplot(dataset, aes(VisionScore, ExcutionScore))
p <- p + scale_x_continuous(expand = c(0, 0), limits = c(0, 100)) 
p <- p + scale_y_continuous(expand = c(0, 0), limits = c(0, 100))

4

We now have our base panel and we can start our journey to build the Gartner Magic Quadrant-Like chart.

First of all, we set the x-axis label as “COMPLETEMENT OF VISION” and set the y-axis label as “ABILITY TO EXECUTE” and make them aligned to left-side. We then remove the axis ticks and text from the plot. We will also add a title to the top of the plot.

p <- p + labs(x="COMPLETEMENT OF VISION",y="ABILITY TO EXECUTE")
p <- p + theme(axis.title.x = element_text(hjust = 0, vjust=4, colour="darkgrey",size=10,face="bold"))
p <- p + theme(axis.title.y = element_text(hjust = 0, vjust=0, colour="darkgrey",size=10,face="bold"))

p <- p + theme(
          axis.ticks.x=element_blank(), 
          axis.text.x=element_blank(),
          axis.ticks.y=element_blank(),
          axis.text.y=element_blank()
        )

p <- p + ggtitle("Gartner Magic Quadrant - Created for Power BI using ggpolt2") 

Those steps will progress our chart to somewhere like:

5

We then add four rectangle type of annotations to fill the four quadrant areas using the Gartner magic quadrant scheme. We also need to create a border and split lines for the quadrant chart.

p <- p +
      annotate("rect", xmin = 50, xmax = 100, ymin = 50, ymax = 100, fill= "#F8F9F9")  + 
      annotate("rect", xmin = 0, xmax = 50, ymin = 0, ymax = 50 , fill= "#F8F9F9") + 
      annotate("rect", xmin = 50, xmax = 100, ymin = 0, ymax = 50, fill= "white") + 
      annotate("rect", xmin = 0, xmax = 50, ymin = 50, ymax = 100, fill= "white")

p <- p + theme(panel.border = element_rect(colour = "lightgrey", fill=NA, size=4))
p <- p + geom_hline(yintercept=50, color = "lightgrey", size=1.5)
p <- p + geom_vline(xintercept=50, color = "lightgrey", size=1.5)

6

We also need to add a label to each quadrant area:

p <- p + geom_label(aes(x = 25, y = 97, label = "CALLENGERS"), 
                    label.padding = unit(2, "mm"),  fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 75, y = 97, label = "LEADERS"), 
                    label.padding = unit(2, "mm"), fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 25, y = 3, label = "NICHE PLAYERS"), 
                    label.padding = unit(2, "mm"),  fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 75, y = 3, label = "VISIONARIES"), 
                    label.padding = unit(2, "mm"), fill = "lightgrey", color="white")

7

Up to this point, our chart starts to look like the Gartner magic quadrant. Next, we need to draw the company points to the chart with the position corresponding to their “Ability to Execute” value and “Completeness of Vision” value.

p <- p + geom_point(colour = "#2896BA", size = 5) 
p <- p  + geom_text(aes(label=Company),colour="#2896BA", hjust=-0.3, vjust=0.25, size=3.2)

8

Our quadrant chart is nearly done, just one part missing, the arrows next to the “Ability to Execute” and “Completeness of Vision” text labels.

10.PNG

As the arrows need to be located outside of the main panel, we need to create custom annotation (annotation_custom) with linesGrob to draw a straight line with an arrow at the far end of the line. To make the arrows to visible outside of the main panel, we need to turn off the clip attribute of the main panel.

p <- p + annotation_custom(
            grob = linesGrob(arrow=arrow(type="open", ends="last", length=unit(2,"mm")), 
                   gp=gpar(col="lightgrey", lwd=4)), 
            xmin = -2, xmax = -2, ymin = 25, ymax = 40
          )
p <- p + annotation_custom(
  grob = linesGrob(arrow=arrow(type="open", ends="last", length=unit(2,"mm")), 
                   gp=gpar(col="lightgrey", lwd=4)), 
  xmin = 28, xmax = 43, ymin = -3, ymax = -3
)

gt = ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name=="panel"] = "off"
grid.draw(gt)

We now have our completed quadrant chart.

9

You can find the complete source code here:

Please find the pbix file here.