Category: Power BI/DAX/M

DAX – Dynamically Showing Title by Slicer in Power BI

When building a Power BI report page for analysing data by a specific dimension (e.g., customer, product and category), we often need to dynamically show the title of currently selected item(s).

1t1

One trick to show the title based on the current filter context is to use Multi-Row Card visual (we can also use the single row Card visual. However, the single row Card visual doesn’t support custom text alignment yet). We can disable the left-side bar and change the style of the text to make it look more like a title.

3

Now we have the dynamical title based on the selected single item.

2

However, this approach doesn’t work when there is no item selected or when there are multiple items selected. When there is no item selected, the Multi-Row Card visual will display all the items. As the size of the Multi-Row Card visual is set to be large enough for only a single item, only the first item is visible. The rest of items are still showing but have been cut out.

5

The title will be misleading when there is no item selected. We expect to see something like “All Categories Selected” or “No filter applied” as title.

6

We meet similar problem when there are multiple items selected where only the first selected item will be visible in the title.

7

Instead, the title needs to be able to let viewers aware that multiple items have been selected and how many of them.

ml1.PNG

Now we have gone through the requirements and we need an solution to be able to handle all the three scenarios: single item selection, no selection and multiple item selection. We can create a single measure to cover all the three scenarios:

Product Category Title = 

  IF (COUNT('SalesLT ProductCategory'[Product Category])=1, 
    VALUES('SalesLT ProductCategory'[Product Category]),
    
    IF (COUNT('SalesLT ProductCategory'[Product Category])=
              CALCULATE(COUNT('SalesLT ProductCategory'[Product Category]), 
                           All('SalesLT ProductCategory'[Product Category])),
        "All Categories Selected",
        
        MIN('SalesLT ProductCategory'[Product Category]) & " and the other "
                & COUNT('SalesLT ProductCategory'[Product Category])-1 & " categories"
    )
   )

Firstly, we check whether only a single item is selected, if so, we return the single selected value, if not, we check whether all items are selected, if so, we return the text showing all items are selected in the current filter context, if not, that means multiple items have been selected, we build a text string to show the name of the first item and the count of the rest of items.

Power Query – Parameterised Files Loading from a Year-Month-Day folder Hierarchy

In one of my previous blog post, I described an approach to load text files from Azure Data Lake Store to Power BI filtered by the date period specified by users through setting the period start and period end parameters. That approach is capable to load text files located at the same folder. However, there is another common folder structure pattern that is often used to organise the files in Azure Data Lake Store, i.e. the Year-Month-Day folder hierarchy, such as:

1t1 It raises a challenge when users want to load files between two dates into Power BI. Firstly, the Power Query query we created needs to be able to load files from different folders. In addition, the query should be able to conditionally load the files based on the data period specified by users through filtering the Year-Month-Day folder hierarchy. This blog post describes how to create Power Query queries for this type of requirements with the following steps.

  • Step 1 – Define the StartDate parameter and EndDate parameter for users to specify the date period
  • Step 2 – Define a custom function to load the metadata of all the files stored in a folder. The custom function takes folder path as argument and returns a table value which stores the file metadata
  • Step 3 – Generate the sequence of dates between the specified StartDate and EndDate
  • Step 4 – Derive the folder path from the generated date
  • Step 5 – Invoke the custom function created at step 2 and expand the returned table field to get the list of files we aim to extract
  • Step 6 – Combine the files and convert to table format

Step 1 – Define parameters

Firstly, we define two Date type of parameters, StartDate and EndDate.

stream-analytics-window-functions-sliding-intro

Step 2 – Define custom function

We then define a custom function that takes the path of a folder as argument and return a table value with the metadata of all files stored at that folder.

1t1

2

Step 3 – Generate date list

We then need to create the query to extract the files from the Year-Month-Day folder hierarchy filtered by the StartDate and EndDate parameters. Firstly, we generate the sequence of dates between the StartDate and the EndDate using built-in List.Dates function:

4

Step 4 – Derive the folder path

5

We then convert the dates list into a table and add the “DatePathPart” column that generate the Year/Month/Day part of the folder path such as “2008/06/09”. We then add the “FolderPath” column that makes the full folder path through concatenating the root folder path and the Year/Month/Day part.

After this step, we should have a table like this:

7

Step 5 – Invoke the custom function and expand returned table

6

We can then invoke the custom function defined at step 2 that will add a table type column. Once we expand the table, we will have the list of all the files we want to query based on the specified StartDate and EndDate parameters.

ml1

This is the full piece of code for this query:

let
#"Filtered Dates" = List.Dates(StartDate, Duration.Days(EndDate - StartDate)+1, #duration(1, 0, 0, 0)),
#"Dates Table"= Table.FromList(#"Filtered Dates", Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
#"Extract Date Path" = Table.AddColumn(#"Dates Table", "DatePathPart",each Text.From(Date.Year([Date])) & "/"
& Text.PadStart(Text.From(Date.Month([Date])),2,"0") & "/"
& Text.PadStart(Text.From(Date.Day([Date])),2,"0")),
#"Add Root Path" = Table.AddColumn(#"Extract Date Path", "RootPath",
each Text.From("adl://ninjagodem0123.azuredatalakestore.net/Samples/Demo/")),
#"Add Folde Path" = Table.AddColumn(#"Add Root Path", "FolderPath", each [RootPath] & [DatePathPart]),
#"Invoked Custom Function" = Table.AddColumn(#"Add Folde Path", "Files", each LoadFilesByFolder(Text.From([FolderPath]))),
#"Expanded Files" = Table.ExpandTableColumn(#"Invoked Custom Function", "Files",
{"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"},
{"Files.Content", "Files.Name", "Files.Extension", "Files.Date accessed",
"Files.Date modified", "Files.Date created", "Files.Attributes", "Files.Folder Path"})
in
#"Expanded Files"

Step 6 – Combine the files

The last step is to combine the list of files we queried at step 1 – 5, extract the data records from the binary csv format and load into a single table for downstream process.

1t1

2

Power Query – Extract Multiple Tags Stored in a Single Text Field

Problem

It is not rare to see that multiple attributes are stored in a single text field especially for tagging enabled applications where an unfixed number of tags may associated with an article or post. Those tags are often stored in a single text filed with a delimiter to separate them. When reporting, we often need to category the articles or posts by the tags, e.g., counting the articles or posts by each tag.

1t1

To fulfil this reporting requirement, We need to reshape our dataset from something like:

1t1

to something like:

2

Solution

It is actually very easy to conduct this kind of transformation using Power Query with only three lines of M code.

3

line 1 – split the tag field into unfixed number of tag columns using Splitter.SplitTextByDelimiter function.

line 2 – use Table.UnpivotOtherColumns function to unpivot all the tag columns. As we don’t have a fixed number of tag columns, we need to use UnpivotOtherColumns function and specify the known columns (“PostID”, “Title” in this example) as arguments.

line 3 – remove the column to store the generated tag column names which will not be used in reporting

Power Query – Parameterised Files Loading from Azure Data Lake Store within a Given Date Range

Power BI now supports data load from Azure Data Lake Store. We can connect to a folder in the Azure Data Lake Store and load all files from that folder.
1t1

3

However, we often don’t want to or aren’t able to load all the files in the Azure Data Lake Store folder into Power BI due to the volume of the data. Instead, we want to be able to specify a data range and only load the files fallen into that data range.

The Azure Data Lake Store connector in Power BI doesn’t provide direct support for conditional data loading based on a given criteria. However, with some help from the M language, we can easily implement this feature through customising the query scripts.

Firstly, we create two Power BI parameters, the StartDate (for the start data of the given data range) and EndDate (for the end data of the give data range).

stream-analytics-window-functions-sliding-intro

When we connect the Power BI to the Azure Data Lake Store folder, the DataLake.Contents(“{ADLS folder path}”) will return the metadata of the list of files stored in that folder. Then we can use Table.SelectRows function to go through each files, extract the date string from the name and convert it to date type, and then check whether the date falls into the give data range though comparing the date to the StartDate parameter and EndDate parameter.

5

A prerequisite for this solution to work is that the date info needs to be included in the file name. It is the common practise (and also good practise for performance reason) to partition the files stored in Azure Data Lake Store by date.

Now we have only loaded the files as we needed into the Power BI and we can combine those files into a single table for the downstream visualisation in Power BI. As the snapshot below showing, we can open the “Combine Files” dialog to combine the files.

6

7

After the files are combined and loaded into Power BI dataset, we can build Power BI visualisations using the data in the dataset. In future, when we need to load files within other data ranges, we don’t have to edit the query again, instead, we just need to set the StartDate and the EndDate parameters and the dataset will be automatically refreshed with data in the new given data range.

5

4

 

Building Power BI Memory Usage Dashboard using DMV

As the VertiPad engine used in Power BI is an in-memory data analytical engine, the key to optimise your Power BI report performance is to reduce the memory usage of your data model. A smaller data model not only increase the data scan speed but also allow you to processing a larger dataset with the same hardware capability.

To support the investigation of memory usage in a Power BI data model, I have created a memory usage Power BI dashboard. The pbix file of this dashboard can be downloaded here.

1

Kasper de Jonge created a PowerPivot version of Memory Usage Report for tabular Analysis Service. If you are a PowerPivot user, you can find the version here.

To use this dashboard in your data model, you need to edit the two parameters in the Power BI report:

2

You can find how to get the ServerName and the Database form Chris Webb’s blog post that introduces how to analyse Power BI DMV Queries in Power BI Desktop.

Once you have connected the dashboard to your Power BI report, you can use the filters to find the memory usage of the tables and the columns in your data model, and also you can customise the dashboard as you want using the DMVModelMemoryUsage dataset.

The dashboard is able to help you identify the tables and columns with large memory footage. You can review those columns with heavy memory usage and see what you can do with them. For example, in the snapshot of the dashboard shown above, the Fact Sale table is the largest memory user. When we select this table, on the bar chart showing the memory usage of each column in the Fact Sale, we can see the two largest memory users are the Sale Key column and the WWI Invoice ID. After we reviewed our data model, we can see find both Sale Key and WWI Invoice ID are not used for the relationships with other table, and both columns will not be used for building the reports. Therefore, we can get rid of those key to reduce the memory usage.

Things better to do When Working with Power BI

This is the third part in the blog series summarising my experience with Power BI from the real-world projects.

  1. Create separate QA/UAT app workspace from Dev app workspace

Separate QA/UAT app workspace from Dev app workspace. Only app workspace admins have contribute permission with QA/UAT app workspaces. Normal report authors will not be able to modify the content in QA/Test app workspaces. Any change/bug fixing requested from QA/UAT will be feedback to report authors and will be implemented in Dev app workspace. The final on-prod Power BI app need to be published from QA/UAT app workspace.

  1. Prefer SSAS Tabular Model over SSAS Multidimensional Model if need live connection

For large-scale dataset, we may need to build data model outside of Power BI and to use Power BI for visualisation. Within Microsoft stack we have two options: SSAS Tabular model and SSAS Multidimensional model. If there is no specify restriction that limits you to use the multidimensional model, prefer to use tabular model for the following reasons:

  • SSAS Tabular Model and Power BI use the same analytics engine and support the same query language (DAX). NFUM BI developers can reuse their Power BI development skills for SSAS Tabular Model development, instead of learning other skills or seek external supports.
  • In-memory processing model used by SSAS Tabular Model is identified as the desired model by Microsoft and continued investment in the tabular model has made it a mature capability
  1. Using DirectQuery mode with cautions

The main reason to use DirectQuery mode is to reduce latency on data updates. This mode is recommended to use only when near-real time reporting is required, or the database is too large to fit in the memory of a single Analysis Services server (> 100 Billion rows). While DirectQuery mode is capable to reduce data update latency, this mode requires directly query from the same relational engine used to update the data with the data is not cached and organised in query efficient structure. Therefore, DirectQuery mode will increase the load on data sources, and can reduce performance of end user report rendering as reports in DirectQuery mode cannot take advantage of Power BI in-memory process engine.

  1. Source control Power BI pbix file and release control organisational Power BI app publishing

Keep the Power BI pbix file in source control repository. The OneDrive storage associated with a Power BI app workspace group (will be auto provisioned when create an app workspace) is a good choice as it is accessible for business users and do not require extra licenses. The release control policy, procedure, and management tool (can be built on the team site in app workspace group) need to be planned.

  1. Use ExpressRoute if possible

You can use Azure ExpressRoute to create private connections between your on-premises data sources and Power BI services through public peering (not Microsoft peering or private peering), instead of through public internet connection. ExpressRoute not only provides you with a network with better performance and reliability, but also ensure the better security and privacy of your data.

  1. On-premises gateway needs to be close to the data sources

The on-premises gateway needs to be deployed as close as possible to on-premises data sources to avoid data transport/network performance issues.