Azure Stream Analytics Patterns & Implementations

Thanks to the increased popularity of IoT and social networks, steaming analytics has become a hot topic and attracted more and more attentions in the data analytics community. Many people (e.g., this and this) believe streaming analytics is the future that will take over the use cases that are traditionally targeted by batch-oriented analytics.

Azure Stream Analytics is Microsoft’s offer of real-time analytics tool which is one major service in Azure Cortana Intelligence Suite. When designing data analytics solutions on Azure platform, we need to know what is the role Azure Stream Analytics can play in our solutions and how we can use Azure Stream Analytics in what use scenarios. Dr. Srinath Perera, an expert on CEP and streaming analytic, has summarised 13 patterns for streaming real-time analytics. Those patterns can be a very useful guide for us to make design decisions in our data analytics solutions.

In this blog post, I will discuss those patterns in Azure Stream Analytics context, evaluate Azure Stream Analytics’ strengths and weaknesses for those patterns,  and explore how to  implement those patterns using Azure Stream Analytics coupled with the supports from other Azure services (e.g., Event Hub, Azure Functions, and Azure Machine Learning).

Firstly, I am going to give a summary of Dr. Srinath Perera’s 13 streaming real-time analytics patterns and then discuss the Azure Stream Analytics implementation for each patterns. In addition, I am going to add an additional pattern, Edge analytics, onto the list, that is specific for Azure Stream Analytics.

Dr. Perera’s 13 stream analytics patterns

  • Pattern 1 – Preprocessing
  • Pattern 2 – Alerts and Thresholds
  • Pattern 3 – Simple Counting and Counting with Windows
  • Pattern 4 – Joining Event Streams
  • Pattern 5 – Data Correlation, Missing Events, and Erroneous Data
  • Pattern 6 – Interacting with Databases
  • Pattern 7 – Detecting Temporal Event Sequence Patterns
  • Pattern 8 – Tracking
  • Pattern 9 – Detecting Trends
  • Pattern 10 – Running the same Query in Batch and Realtime Pipelines
  • Pattern 11 – Detecting and switching to Detailed Analysis
  • Pattern 12 – Using a Model
  • Pattern 13 – Online Control
  • Pattern 14 (additional) – Edge Analytics

Pattern 1 – Preprocessing

One basic and common task for streaming analytics is data preprocessing that filters, reshapes, splits/combines and transforms incoming raw data into a format suitable for further processing and analysis.

Azure Stream Analytics provides a good support for data preprocessing tasks. The Stream Analytics Query Language is a sql-like language using a subset of T-SQL syntax. The developers with T-SQL skills can easily create scripts for those common data preprocessing task in Azure Stream Analytics with the SQL knowledge they already have. The Stream Analytics Query Language allows them to preprocess streaming data just in the same way as they preprocess batch-oriented data.

7*This snapshot is from Microsoft

Pattern 2 – Alerts and Thresholds

This pattern is a very common streaming analytics pattern, especially in many industrial IoT uses cases. In this pattern, the streaming analytics program detects the abnormal condition based on a pre-defined threshold and generates alerts based on the condition.

Anomaly detection using “WHERE” clause

We can use the “WHERE” clause of Stream Analytics Query Language in Azure Stream Analytics to detect the abnormal condition, and then output the queried event in the abnormal condition to a “Alert” output port,  e.g.,

SELECT  DeviceID, Temperature, "Over Temperature"  AS ErrorStatus
INTO AlertOutput
FROM TelemetryInput
WHERE Temperature >100

Anomaly detection using “ANOMALYDETECTION” Operator

The machine learning-based “ANOMALYDETECTION” Operator is a new feature recently added in Azure Stream Analytics and is currently under Preview release. This operator takes advantage of machine learning algorithm to detect events or observations that do not conform to the expected patterns.

The “ANOMALYDETECTION” Operator is very easy to use, similar to the way how LAG Operator is used.

3

you can find more details about the “ANOMALYDETECTION” Operator here.

Handling Alerts

When an abnormal condition is detected and output to the AlertOutput stream, We can handle the alert output in a number for ways on the Azure platform.

  1. Output the alert output into a live dashboard
  2. Send alert notifications
  3. Automatically handling the alert by adjust the setting of equipment

Azure Stream Analytics support the output of stream to real-time Power BI dashboard. With this feature we can show the real-time alerts on the Power BI dashboard monitored by the maintenance engineers.

6

The alert can also be send to the maintenance engineers in the push mode. Thanks to the recently added Azure Functions output target in Azure Stream Analytics, it is much easier for developers to send out the alerts through email or notifications without the need to first output the stream to service bus queues and then access Azure Functions from there. The developers can now directly egress the alert stream to Azure Functions where they can implement the logic for alert delivery.

1

When combined with Azure IoT Hub, we can also make the monitored equipment to automatically adjust settings based on the alerts. For example, Microsoft has created a real-time data processing solution for KingwayTek that takes advantage of Azure Stream Analytics, Azure Functions and Azure IoT Hub to proactively raise an alert on the vehicle status and the alert will trigger vehicle reconfiguration.

4

*This snapshot is from Microsoft

Pattern 3 – Simple Counting and Counting with Windows

In this pattern, the raw, atomic stream events will be aggregated in a time window to reveal the potential patterns and behaviours. For example, the raw message of a single website visit event may not provide us much meaningful insight but the average view counts per hour or per day can reveal the pattern of the website visits, e.g., the website has more visits in the evening than the morning.

To implement this pattern, the streaming analytics service need to support two types of functions, aggregation and time windows. Azure Stream Analytics provides good supports for both functions.

The Stream Analytics Query Language provides a list of built-in aggregate functions that can cover most of common aggregation requirements.

1.PNG

In addition, Azure Stream Analytics supports user-defined aggregates (UDA) written in Javascript that gives developers the extra power and flexibility to implement complicated aggregate rules.

Azure Stream Analytics also provide good supports on time windows. Three time window functions are supported by Azure Stream Analytics, including Tumbling window, Hopping window and Sliding window.

The tumbling window function, TumblingWindow,  segments a data stream into the repeated, non-overlap, and distinct time windows.

1t1*This image is from Microsoft

The hopping window function, HoppingWindow,  generates time windows that hops forward in time by a fixed period. Compared to the tumbling windows, the hopping windows can overlap with others so same events may fall in more than on hopping windows.

2*This image is from Microsoft

The sliding window function, SlidingWindow, generates time window when an event occurs. The time window ends at the time when the event happens and the start of the time window is defined by the period parameters specified in the SlidingWindow function.

stream-analytics-window-functions-sliding-intro*This image is from Microsoft

Pattern 4 – Joining Event Streams

This pattern is used for the scenarios where multiple data streams need to be processed to create a new event stream. For example, we may have multiple sensors that collect data for different aspects of an object or event.

Azure Streaming Analytics supports multiple inputs from a variety of stream data sources.

1t1.PNG*This image is from Microsoft

After the inputs are defined in Azure Streaming Analytics you can reference the inputs by name using Stream Analytics Query Language.

Pattern 5 – Data Correlation, Missing Events, and Erroneous Data

This pattern correlates the data from different streams or within the same stream. Dr. Perera has give some use cases of this pattern in his article, such as matching up two data streams that send events in different speeds, detecting a customer request that has not been responded within one hour, and detecting failed sensors by comparing a set of sensors that monitor overlapping regions.

In Azure Stream Analytics we can take advantage of the T-SQL syntax of the Stream Analytics Query Language to implement the pattern. For example, we can use Join clause to join different streams on the id of monitored object (e.g., the id of a machine where different sensors are installed on) and use the operators provided by T-SQL to find the correction.

Pattern 6 – Interacting with Databases

In many use cases the streaming data alone is not enough for us to dig out meaningful insight for the businesses. The data from the streaming source can only become useful when combined with historical, businesses oriented data. The streaming analytics service need to be able to fetch data from other business databases and combine with streaming data. For example, we need to check the blacklists when processing a real-time service request.

Azure Stream Analytics do provides the supports of reference data join in the Stream Analytics Query Language. To use this feature, we need to create a Reference type input that fetch the reference data from Azure Blob storage.1t1

Up until to the point, only Azure Blob storage is support as the reference data source for Azure Stream Analytics. We need to use Azure Data Factory to move the reference data from where they are originally stored into a Azure Blob storage instance. The reference data is modeled as a sequence of blobs in ascending order by the datatime specified in the blob name.

As most of reference data is slow changing type of data, the streaming analytics solutions also need to ensure the reference data they combined with the streaming data is up-to-date. Azure Stream Analytics do provides an approach to support slow changing reference data but has some limitations.

Firstly, the reference data blob stored in the Azure Blob storage cannot be updated as that would cause the Stream Analytics jobs to fail. Therefore, we can only add a new blob to store the updated reference data using the same container and path pattern defined in the job input with a date/time greater than the one specified in the last blob in the sequence. Secondly, the old reference data blobs must not be altered or removed.

Pattern 7 – Detecting Temporal Event Sequence Patterns

In this pattern, the streaming analytics is used to detect the temporal event sequence patterns. For example, a machine may fail to work after showing a sequence of status in a certain order. The streaming analytics solution need to be able to detect the sequence pattern so that an alert can be sent to engineers when the pattern occurs.

In the example provided by Dr. Perera, he used Storm and Siddhi (a CEP engien) to detect the temporal event sequence patterns. We can use the Stream Analytics Query Language in Azure Stream Analytics to implement the example. However, I think a better solution that can cope with more complicated use cases is to use machine learning algorithm to detect the pattern and make the prediction. Azure Stream Analytics provides good supports to the Azure Machine Learning. I will provide more details about the Azure Stream Analytics and Azure Machine Learning integration when discussing the Pattern 12.

Pattern 8 – Tracking

This pattern refers to the streaming analytics use cases on tracking something over space and time in one or more given conditions. Those use cases are often combined with IoT use cases that monitoring the real-time status or movements with something. For example, tracking the movement of missing airline luggage.

Azure Stream Analytics comes with real-time geospatial analytics capability that provides native functions for geospatial operations such as computing geospatial data as points, lines, polygons and also supports the join of multiple geospatial data streams to solve more complicated use cases.

Pattern 9 – Detecting Trends

This pattern detects the trend over time series data, e.g., usage increases and drops, peaks, outliers etc. Same as Pattern 8, this pattern is often used in the IoT use cases.

In Azure Stream Analytics, for simple use cases, we can use Stream Analytics Query Language to query the peak (MAX) value, outliers (ANOMALYDETECTION), and start value and end value in a time window for computing the trends . When combined with Power BI  dashboard, we can provide the time series based charts to visualise the trends.

For more complicated use cases, we may need to use some other functions outside of the Stream Analytics Query Language (e.g., is_monotonic_decreasing /is_monotonic_increasing in Python) or we may need time-series analysis model (e.g., ARIMA) for forecasting use cases.  At this moment, Azure Stream Analytics does not support Python or R. However, we can take a workaround that implements the algorithm in Azure Machine Learning studio with Python or R scripts and publish it as a rest api and then integrate it with the Azure Stream Analytics.

Pattern 10 – Running the same Query in Batch and Realtime Pipelines

I found the title of this Pattern “Running the same Query in Batch and Realtime Pipelines” is a bit of confusing, but from Dr. Perera’s explanation, this pattern refers to the Lambda Architecture which is the most popular data analytics architecture used in IoT use cases at this moment.

Lambda Architecture separates the IoT data analytics into two paths, hot path (in other name, speed layer) and cold path (batch layer). The hot path refers to the stream data processing path and the cold path refers to the batch-oriented data processing path. Microsoft Azure Cortana Intelligence suite provides good supports to the Lambda Architecture. More details can be found here.

1t1.PNG

*This snapshot is from Microsoft

Pattern 11 – Detecting and switching to Detailed Analysis

This pattern is used for the use cases where an anomaly or behaviour can be identified by the streaming analytics and further detailed analysis is required against the historical data. I think this pattern can be viewed as a sub-pattern of Pattern 10.

This pattern can be supported on Azure platform using Lambda Architecture as introduced above.

Pattern 12 – Using a Model

This pattern refers to use machine learning model in stream analytics. I have mentioned some use cases in previous patterns where machine learning model need to be used.

Azure Stream Analytics provide a Azure ML type function to support the integration with Azure Machine Learning.

1t1

The machine learning developers can implement the model using Azure Machine Learning studio and publish as a rest api. An Azure Stream Analytics job can call the api using the Azure ML function.

1t1.PNG

*This snapshot is from Microsoft

Pattern 13 – Online Control

This pattern refers to AI-related use cases such as autopilot, self-driving and robotics. Dr. Perera does not provide much details about this pattern in his article and presentation slides. I think Azure Stream Analytics is not designed for this type of application.

Pattern 14 (additional) – Edge Analytics

I have added this pattern to Dr. Perera’s list as Edge computing has become more and more important in IoT use cases and Azure Stream Analytics along with Azure Machine Learning are the main component in Microsoft’s Edge computing offer.

With Azure Stream Analytics on IoT Edge, the real-time analytics intelligence can be deployed close to IoT devices to achieve low latency, resiliency, efficient use of bandwidth and compliance.

2.PNG

*This snapshot is from Microsoft

 

Advertisements

Execute R Scripts from Azure Data Factory (V2) through Azure Batch Service

One requirement I have been recently working with is to run R scripts for some complex calculations in an ADF (V2) data processing pipeline. My first attempt is to run the R scripts using Azure Data Lake Analytics (ADLA) with R extension. However, two limitations of ADLA R extension stopped me from adopting this approach. Firstly, ADLA R extension supports only one input dataframe and one output dataframe at the time when I write this blog post. However, the requirement I am working with needs the R scripts to take multiple dataframes as input and output multiple result dataframes. Secondly,  the total size for the input and output is limited to 500 MB.

Another attempt I have taken was to create an ADF custom activity to execute the R scripts in Azure Batch Service. This option turns out to be a pretty flexible, easy to implement and manage approach. This blog post highlights the key steps to implement this approach and also mentions the lessons I have learnt.

Step 1 – Preparing Azure Batch Service

Firstly, we need to add an Azure Batch pool in an Azure Batch service instance. If the Azure Batch service instance doesn’t exist yet, a new instance needs to be provisioned. Please refer to Microsoft official docs for the details on creating Azure Batch service and pools.

While adding the Azure Batch pool, we need to specify the VM image to provision as the computing nodes in the pool. We can use Data Science Virtual Machine image which ships with most of common R packages.

Capture1.PNG

Step 2- Creating Container in Azure Blob Storage to Host R Source Files and Input/Output Data

Create a Blob storage container in your Azure Storage account and then create an Input folder and an Output folder within the container root folder.

The R source files will be deployed into the container root folder. The input data files are the output of the upstream process activities in the ADF pipeline and pushed (copied) into the Input folder. The R source files and the input data files will be submitted to execute in the Azure Batch service (by the Azure Batch custom activity which will be created later in the ADF pipeline), the output will be written into the Output folder in the Azure Blob storage.

Step 3 – Authoring R Scripts to Communicate with the Azure Blob Storage

When the Azure Batch custom activity is triggered in the ADF pipeline, the R source files and the input data files will be submitted into the work directory created for the submitted task on the Azure Batch computer nodes. The R scripts will load the data in the data files into dataframes, run the calculations and transformations, and finally write the results to output data files and store them in the work directory. The output data files will then be written into the Output folder in the Azure Blob storage using the blob operation functions provided by rAzureBatch package. Here is a good sample from the doAzureParallel Github site on the blob operations with rAZureBatch.

Basically, we need first to create a rAzureBatch StorageServiceClient with the Azure Storage account credentials.

storageCredentials <- rAzureBatch::SharedKeyCredentials$new(
    name = "{name of the Azure Storage account}",
    key = "{access key of the Azure storage account }"
)

storageClient <- rAzureBatch::StorageServiceClient$new(
   authentication = storageCredentials,
   url = "{url of the Azure Blob storage}"
)

Then, we need to create a SAS token with write permission on the Output folder

writeSasToken <- storageClient$generateSasToken(permission = "w", "c", path = {the path of the Output folder})

Lastly, we can save the output file into the Output folder on Azure Blob storage with the uploadBlob function.

response <- storageClient$blobOperations$uploadBlob(
    containerName,
    fileDirectory = "{output file name}",
    sasToken = writeSasToken,
    accountName = storageAccountName)
if (response[['status_code']]!=201) {
    stop('Failed to save the output file.')
}

It is important to explicitly check the response status_code and throw error when the save action is failed. Otherwise, the ADF pipeline will not able to capture the error but instead treat the custom activity as running successfully and move on to the downstream activities.

Step 4 – Setup Azure Batch Custom Activity in ADF Pipeline

After the Azure Batch service part of work is done, we need to add and configure the Azure Batch Custom Activity in ADF Pipeline. This step is pretty straightforward, please refer to the Microsoft official docs for the more details. The only part needs to note is the settings of “Command” and “Folder path” of the Azure Batch Custom Activity. The “Command” should be “RScript {the entry R file you want to run}”, and the “Folder path” should be the container we have created earlier to host the R source files.

b

Those are the four main steps to setup the execution of R scripts in ADF pipelines.

Below lists a few of tips that might be helpful:

If you need to install additional R packages for your scripts, specify the lib path with the environmental variable, AZ_BATCH_TASK_WORKING_DIR,  in order to install the packages into the working directory of the current task. Please refer to my previous blog post for further explanation.

 install.packages("tidyselect", lib=Sys.getenv("AZ_BATCH_TASK_WORKING_DIR"))

If the Azure Batch Custom Activity throw an UserError with message as “Hit unexpected expection and execution failed.”, you can find the detailed error message from the stderr.txt file on the working directory of the failed task.

Capture

To access the working directory, you need to go to the Azure Batch account –> Jobs –>  select job –> select the failed task, and select “Files on node”.

Capture01

Then you should be able to see all the files existing on the working directory of that task, including the stderr error output file.

Capture02.PNG

 

The Tip for Installing R packages on Azure Batch

In one project I have been recently working with, I need to execute R scripts in Azure Batch. The computer nodes of the Azure Batch pool were provisioned with Data Science Virtual Machines which already include common R packages. However, some packages required for the R scripts, such as tidyr and rAzureBatch, are missing and need to be installed.

My first attempt to install the package through running install.packages(“tidyselect”) was failed. The error message revealed that I didn’t have write permission to install package. It actually makes sense that Azure Batch doesn’t allow the jobs submitted from the consumers to change the global environment of computer nodes. Therefore, the R packages can only be installed somewhere on the computer nodes where the submitted jobs have write permission.

After a bit research, it turns out that Azure Batch organised the submitted jobs in this structure. a

Azure Batch exposes a portion of the file system on a computer note for the submitted job to access. One file directory will be created for each job and each task associated with the job. Within each task directory, a working directory, wd, will be created which provides read/write access to the task. As the task has full permission to create, update and delete content within this directory, here should be the place where we can install the R packages to. Azure Batch provides an environment variable AZ_BATCH_TASK_WORKING_DIR to specify the path of the directory of the current task.

With the knowledge of the task working directory, we can install the R package with the specified installation target directory.

install.packages("tidyselect", lib=Sys.getenv("AZ_BATCH_TASK_WORKING_DIR"))

When attaching the installed package, we need to explicitly specify the directory as well:

library(tidyselect, lib=Sys.getenv("AZ_BATCH_TASK_WORKING_DIR"))

Why Bother to Use Pandas “Categorical” Type in Python

When we process data using Pandas library in Python, we normally convert the string type of categorical variables to the Categorical data type offered by the Pandas library. Why do we bother to do that, considering there is actually no difference with the output results no matter you are using the Pandas Categorical type or the string type? To answer this question, let’s first take a simple test.

In this test, we create a data frame with two columns, “Category” and “Value”, and generate 50 millions rows in this data frame. The values of “Category” column are generated from a list of six predefined categories, [‘category1’, ‘category2’ … ‘category6’], and the values of “Value” column are generated from the list integer of [0 … 9].

0102

We first time the execution of a group by operation against the “Category” column which is in the string type and also observe the memory usage of the “Category” column.

03

We then convert the “Category” column to the Pandas Categorical type, repeat the execution of the same group by operation and observe the memory usage of the “Category” column in Pandas Categorical type.

04

The result reveals a five times improvement on running speed and one eighth memory usage when converting the “Category” column to the Pandas Categorical data type.

05

This test result answers our original question that the reason to use Pandas Categorical data type is for the optimised memory usage and improved data processing speed. Then why does the Categorical data type have such magics? The answer is pretty simple, i.e. dictionary encoding.

If we open the source code of the Pandas Categorical class, we can see this class contains two properties, “categories” and “codes”.

06

After we printed the two properties for the “Category” column used in our test, we can see the “categories” property stores the dictionary of the six categories available for the “Category” column and the actual category information of the “Category” column for all the rows in the data frame is stored in the “codes” property in the format of integer number which points to the position of the corresponding category in the “categories” property.

07

08

In this way, Pandas Categorical data type takes much less memory space to store the category information in integer type compared to store in the original string type. The query operations on the category column scan less memory space and therefore the time used on the query is shorten.

Dictionary encoding is a common technique used for data compression. For example, Azure Analysis Service and Power BI also used dictionary encoding in their Vertipad engine to compress data to reduce memory usage and to increase query speed.

13fig04
*This figure is from  the book “Definitive Guide to DAX” authored  by Alberto Ferrari and Marco Russo

 

Build a Power BI Knowledge Base Bot Using Microsoft Bot Framework and QnA Maker

The first question pop out of my head when I heard Microsoft Bot Framework is how to build some sorts of bots, which is capable to replace me for any of my responsibilities in my work. Part of my Power BI consulting responsibilities is to answer all sorts of Power BI related questions from clients. That would be pretty cool to have a bot to take those responsibilities from me, on the basis that there is no risk to my job security :-).

After a quick research, I found it is very easy to create a Power BI knowledge base bot using Microsoft Bot Framework and QnA Maker service. This blog post walks through the steps to create the bot.

02

There are four major steps to build a Power BI knowledge base bot:

  • Step 1 – Create QnA maker and train the knowledge base
  • Step 2 – Create a bot with the Bot Builder SDK and deploy to an Azure App Service
  • Step 3 – Register the created bot with Azure Bot Service
  • Step 4 – Connect the bot to channels

Step 1 – Create QnA maker and train the knowledge base

QnA maker is one of Microsoft cognitive services that enables developers or business users to build, train and publish QnA type of bot based on FAQ urls, structured documents or editorial content. You can logon to the QnA maker services website through this link, https://qnamaker.ai/, where you can create, manage, test and publish your QnA bot.

To create a QnA maker, you need to click the “Create new service” link on the top navigation bar on the QnA maker service website to launch the “Create a QnA service” form.

0317

You need to provide the name of the QnA maker and to specify the urls of the knowledge base pages or to upload the files that contains your questions and answers.

After the QnA maker is created, you can access it from “My services” page. You can select to edit your QnA maker that brings you to the maker edit page where you can edit, retrain, test and publish your QnA maker.

 

04On the knowledge base tab in the maker edit page, all the questions and answers from the web pages of the knowledge base you specified or from the documents you manually uploaded will be extracted into a table, where you can manually delete the imported question/answer pairs or manually add and edit new pairs.

After you have edited your QnA maker, you can save and retrain it using the green “Save and retrain” button on the top right on the maker edit page, and then test the QnA maker on the “Test” tab.

05

After you are happy with your QnA maker, you can publish it as a service using the blue “Publish” button on the top right of the maker edit page. After the maker is successfully published, you will be provided with the sample HTTP request for building your bot. you need to note down the QnA knowledge base id and the subscription key (highlighted with red box in the snapshot below) that will be used later when create the bot.

06.PNG

Step 2 – Create a bot with the Bot Builder SDK and deploy to an Azure App Service

There is a couple of ways to create a bot. In this blog post, I will use Bot Builder SDK for .Net to create the bot in Visual Studio 2017.

Before you can use Visual Studio 2017 to create a bot with Microsoft Bot Framework, there are some prerequisite tasks you need to do, such as installing project and item templates. You can follow this doc for those prerequisite tasks: https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-quickstart

After you have done all the prerequisite tasks, you can see a new “Bot Application” project template on the “New Project” dialog.

07

After you created a project using the “Bot Application”  template, you will have the predefined controller and dialog files, and the predefined configuration settings in the web.config file.

10

Open the web.config file, under the appSettings section, add two keys for QnA subscription key and Knowledge base id that you have noted down earlier when you published your QnA maker service.
08

Under the Dialogs folder in your project, create a new dialog file using the dialog item template and name it as PowerBIQnADialog. Add a reference to Microsoft.Bot.Builder.CognitiveService.QnAMaker, and change the dialog file to inherit from QnAMakerDialog. Call the base constructor of the class and pass the QnA maker attribute data over, including the QnA subscription key and QnA knowledge base id (specified in the web.config file), the bot message text when there is no answer matching the asked question, and the matching threshold for deciding whether an answer is matching the question.

09

After the dialog file is created, open the existing MessageController.cs file. In the “Post” method, change the Dialogs.RootDialog() to Dialogs.PowerBIQnAdialog() as the snapshot shown below.

11

After you referenced the right dialog class in the MessageController.cs file, you can publish the bot to an Azure App service. You can create the app service through Azure Portal, or create it in Visual Studio when select to publish the project.

12

Step 3 – Register the created bot with Azure Bot Service

After you have created a bot and deployed to the Azure app service, you can create an Azure bot service and register the bot you have created to the bot service.

Logon to the Azure Portal, create a new Bot Channels Registration service. Specify the general information such as name, subscription, location, etc. The most important attribute to specify is the “Message endpoint” attribute of your created and deployed bot, which is in the format as {url of the app service where your bot is deployed}/api/messages.

01

After the bot service has been created, you can change the messaging endpoint in the Settings tab if required.

14

Also, on the Settings tab, you can get the Microsoft App ID for the bot service and the password (you can click the “Manage” link to generate a new password. The password created previously is not visible if you have not noted it somewhere before).  You now need to update the web.config file in your bot Visual Studio project to set the Microsoft App ID and the password to the MicrosoftAppId and the MicrosfotAppPassword key. After the change, you need to republish your bot to update the Azure app service.

18

After the your bot is updated in the Azure app service, your bot service is ready to use. you can test it in the “Test in Web Chat” tab in your Bot Registration Channels service.

15

Step 4 – Connect the bot to channels

A bot channel is the connection between the Bot Framework and communication apps which can talk to your bot. You can create channels on the “Channels” tab in you Bot Registration Channels service page. There are quite a number of apps that have been supported by Bot Framework, such as Skype, Teams, Facebook Message etc.

19.PNG

In this blog post, I created a Microsoft Teams channel to communicate to our Power BI QnA bot through just clicking the Microsoft Teams icon on the channel page, and then channel is up running.

20

On the Chat tab in your Microsoft Teams app, start a new chat to the Microsoft App ID of your bot service.

21

And now you can start to ask your bot Power BI related questions from Microsoft Team app.

02

Workaround of the Power BI Guest User License Assignment Issue

I have recently been working with an insurance client to help them design a solution to distribute Power BI app to external guest users with Azure AD B2B. I have designed this solution based on Microsoft Power BI content external distribution guideline whitepaper: Distribute Power BI content to external guest users using Azure Active Directory B2B.  However, I have met an issue for assigning external guest user the Power BI pro license.

To allow external access to the Power BI app, guest users must have a Power BI pro license. Power BI supports three approaches to license external users, using Power BI Premium, assigning Power BI pro licenses to guest users, using guest users’ existing Power BI pro license if they have. This client decided to assign Power BI pro licenses to those guest users who do not have one already.

Based on the guideline whitepaper, we should be able to assign the Power BI pro license to guest user through Office 365 admin portal.

Capture.PNG

However, on the Office 365 admin portal, the product licenses management option is only available for the internal users but not for the Azure AD B2B guest users. After a google research online, there seems an issue currently with guest user license assignment through Office 365 admin portal. It looks the only option for now is to assign guest user license through API.

Here is the scripts for assign Power BI license:

Connect-MsolService
Set-MsolUser -UserPrincipalName "{User principal name}" -UsageLocation GB
Set-MsolUserLicense - UserPrincipalName "{User principal name}" -AddLicenses “{Tenant Name}:POWER_BI_PRO"

The {User principal name} is the guest account name of the invited external user. You can find it on the Username field of the guest user on Office 365 admin portal.

The -UsageLocation parameter in the Set-MsolUser cmdlet specifies the location country of a user with a two-letter ISO code. You can lookup the country code through this link: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2.

The -AddLicenses parameter in the Set-MsolUserLicense cmdlet specifies the license to assign to the user which consists two part: {Tenant Name} (the name of the tenant) and POWER_BI_PRO (license code for Power BI pro).

SSIS in Azure #3 – Schedule and Monitor SSIS Package Execution using ADF V2

*The source code created for this blog post can be found here.

In the previous blog posts in the SSIS in Azure series, we created a SSIS package to periodically ingests data from Azure SQL database to Azure Data Lake Store and deployed the package in the Azure-SSIS Integrated Runtime. Up to this point, we have achieved two goals in the SSIS in Azure series:

  • Using SSIS to move data between cloud storages
  • Host and run SSIS packages in cloud

The last goal we need to achieve is to schedule the execution of the SSIS package in cloud. Traditionally, we can schedule the SSIS package execution by creating a SQL Server Agent job. However, we can only do that through an on-premises SQL Server instance or provision a SQL Server vm in Azure. Thanks to the support of Stored Procedure activity in ADF, we can now schedule the SSIS package execution using a cloud-based ADF pipeline. This blog post will walk through the steps to achieve that.

You can find the source code created for this blog post in my Github. The ningago.demo.adf.ssis.adfv2 project contains the json and powershell files to create the ADF pipeline.

4.PNG

 

The key to this solution is to call the sp_executesql stored procedure in the RunSSISPackagePipeline pipeline that execute the sql script for triggering the SSIS package execution. We can define a Stored Procedure activity in the pipeline to call the sp_executesql stored procedure and pass in the sql script for triggering the SSIS package execution as parameter.

5

We don’t have to manually author the sql script from scratch, but instead we can generate the script using SSMS. First, we connect to the SSISDB catalog in our Azure-SSIS integrated runtime (please refer to SSIS in Azure #2 for how to do that), select the SSIS project or package we can execute, and click the Execute button to open the Execute Package dialog window.

6

On the Execute Package dialog window, we select the package to execute and also set a value (can be any value) for the package parameter (the reason is set any value to the parameter is to ensure the parameter setting statement will be generated in the sql script).

2

We then click the “Script” button on the top-left that will generate the sql script for executing the selected SSIS package.

3

We can then copy the script into the stored procedure activity defined in the RunSSISPackagePipeline pipeline.

5

As you may have noticed that the SSIS package parameter, DateScliceToLoad, has been manually set by us with a random value. The next step we need to take is to pass the date of pipeline run into the sql script in order to make the SSIS package only move the data in the give day. Firstly, we need to define the “scheduledRunTime” parameter in the RunSSISPipeline which will receive the scheduled pipeline run time from the pipeline trigger we will create later.

6.PNG

In the sql script we just generated to trigger SSIS package execution, we replace the random date we have manually set with the ADF expression which points to the scheduleRunTime parameter in the pipeline.

7.PNG

Next, we will need to create the trigger json file to schedule the execution of the pipeline. After we specify the pipeline to execute in the trigger file, we need to pass the scheduledTime of the trigger which is a system variable to the scheduledRunTime parameter defined in the pipeline.

{
  "properties": {
    "name": "RunSSISPackageScheduler",
    "type": "ScheduleTrigger",
    "typeProperties": {
      "recurrence": {
        "frequency": "Hour",
        "interval": 1,
        "startTime": "2018-01-07T00:00:00-22:00"
      }
    },
    "pipelines": [
      {
        "pipelineReference": {
          "type": "PipelineReference",
          "referenceName": "RunSSISPackagePipeline"
        },
        "parameters": {
          "scheduledRunTime": "@trigger().scheduledTime"
        }
      }
    ]
  }
}

After we have created the json files to define linked service, pipeline and the trigger, we need to deploy them into our data factory instance. As the V2 version of ADF does not support deployment through UI yet, we need to create a deployment scripts using PowerShell.

$DataFactoryName = "ninjago3843adf"
$ResourceGroupName = "SSIS"

Set-AzureRmDataFactoryV2LinkedService -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName -Name "SSISDBLinkedService" -File ".\SSISDBLinkedService.json"
Set-AzureRmDataFactoryV2Pipeline -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName -Name "RunSSISPackagePipeline" -DefinitionFile ".\RunSSISPackagePipeline.json"

Stop-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler"
Set-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler" -DefinitionFile ".\RunSSISPackageScheduler.json"
Start-AzureRmDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name "RunSSISPackageScheduler"

After the pipeline is deployed and scheduled by the trigger, the pipeline will execute our SSIS package based on the schedule.

8.PNG