Tag: Power BI

R Visual – Build Eurovision Voting Network Chart in Power BI

I have been watching Eurovision competitions for several years. I personally think the voting results from Eurovision competitions can be a very good source for the research of relationships between European countries. In this blog post, I will create a social network R visual using iGraph package and use the visual to analyse the voting network of Eurovison competitions.

1

Firstly, we need to prepare our raw Eurovision dataset into the following format that contains three columns, “From country” (where is the vote from), “To Country” (where is the vote to), and “Avg Point” (that computes the average points the “From country” has given to the “To Country” over the years. You can prepare the data either using DAX (creating the “AvgPoint” measure) or Power Query (group by “From country”+”To Country” and calculate average points).

a3

We add a R visual to the Power BI canvas and add the three columns to the R visual. If you prefer to use other R IDE (e.g., RStudio) to edit the R scripts, you can bind your IDE to Power BI.

We will use the igraph R package to render the voting network chart. Firstly, we need to load igraph library and then create a igraph data frame from the dataset specified on the Power BI R visual. We will use the plot function to render the network chart with the style attribute settings of vertex, edge etc.

# user igraph library
library(igraph)

# create a igraph data frame from the dataset specified on the Power BI R visual
df.g <- graph.data.frame(d = dataset, directed = TRUE)

# define colors
comps <- components(df.g)$membership
colbar <- rainbow(max(comps)+1)
V(df.g)$color <- colbar[comps+1]

# render the network chart and set the style attributes of vertex, edge etc.
plot(df.g, 
     vertex.label = V(df.g)$name,
     layout=layout_with_fr, 
     vertex.size=12,
     vertex.label.dist=0, 
     vertex.label.color= "darkblue",
     vertex.shape = "circle",
     vertex.label.cex = 1,
     vertex.label.font = 2,
     edge.arrow.size=0.5,
     edge.curved=T,
     margin =-0.05
 )

a4

After we authored and tested the R script in RStudio, we can now add the script to the R visual in Power BI that will be able to interact with other visuals on the same page.

Before we set any threshold on the average voting points, all voting paths between the countries will be draw on the network chart that makes the chart unreadable.

a2

However, when we set a higher average voting points as threshold which only shows the voting path over the threshold, we can find some relationship patterns.

1

For example, we can see the mutual high votes between neighbour countries like Spain <-> Andorra, Roumania <-> Moldova, and Greece<->Cyprus.

11.PNG

Please find the pbix file here.

 

Advertisements

R Visual – Building Facet Grid in Power BI

Since Power BI started to support R visual, it has become difficult to criticise Power BI’s visualisation capability because we can now take full advantage of R’s powerful visualisation packages such as ggplot2 to create Power BI reports. Unlike creating Power BI custom visual which is a rather time-consuming task, we can create eye-catching charts in just a few of lines with R visual.

Facet grid is a popular chart type but is not supported by Power BI yet. However, we can easily build a facet grid chart with the help of ggplot2 package.

1t1

Firstly, we need to get our data into the right format. In this example, we use the Eurovision competition dataset which contains the voting records between 1975 to 2016.

5

We need to calculate the rank of each country for each year based on the points they received from the rest of countries. We can use the DAX RANKX function to calculate the rank measure and get the results like:

2

Now we are ready to create our facet grid visual. We add a R visual to the Power BI canvas and add three columns, Year, ToCountry, Rank, to the visual.

7.PNG

3

On the R script editor, we first reference the ggplot2 library and then create a ggplot object placing Year on x-axis and Rank on y-axis. The key step is to add facet_wrap(~ToCountry) that generates the facet grid by voting destination country (ToCountry column).

4

Please download the pbix file here.

DAX – Ranking with Hierarchy Drill-Down

Problem

We can use the DAX RANKX function to return the rank of an item based on one of its numerical attribute. For example, we can create a measure using RANKX function to calculate the ranks of the product categories based on the order quantity within the product categories.

1t1

2

This measure will normally work fine until we drill the product hierarchy down to next level. As the rank measure is calculated at the parent-level, it will not work at the child-level.

5

Solution

To ensure the rank measure working at both parent-level and child-level, we need to calculate the rank accordingly depending on the what level the current filter context is at. We can use the DAX ISFILTERED function to evaluate current level.

In the example below, we created two measures, [Category Filtered] and [Product Filtered], using the ISFILTERED function. At the product category level, we can see only the Product Category column is filtered.

1t1

When we drill the table down to the product level, both product category and product columns are filtered.

2

We can now create a single measure to return the correct rank at both hierarchy level:

We first use the ISFILTERED function to evaluate which level is the current context at and then use the RANKX function to calculate the ranks accordingly. As the snapshot below shown, the rank will work fine at both levels.

34

 

DAX – Dynamically Showing Title by Slicer in Power BI

When building a Power BI report page for analysing data by a specific dimension (e.g., customer, product and category), we often need to dynamically show the title of currently selected item(s).

1t1

One trick to show the title based on the current filter context is to use Multi-Row Card visual (we can also use the single row Card visual. However, the single row Card visual doesn’t support custom text alignment yet). We can disable the left-side bar and change the style of the text to make it look more like a title.

3

Now we have the dynamical title based on the selected single item.

2

However, this approach doesn’t work when there is no item selected or when there are multiple items selected. When there is no item selected, the Multi-Row Card visual will display all the items. As the size of the Multi-Row Card visual is set to be large enough for only a single item, only the first item is visible. The rest of items are still showing but have been cut out.

5

The title will be misleading when there is no item selected. We expect to see something like “All Categories Selected” or “No filter applied” as title.

6

We meet similar problem when there are multiple items selected where only the first selected item will be visible in the title.

7

Instead, the title needs to be able to let viewers aware that multiple items have been selected and how many of them.

ml1.PNG

Now we have gone through the requirements and we need an solution to be able to handle all the three scenarios: single item selection, no selection and multiple item selection. We can create a single measure to cover all the three scenarios:

Product Category Title = 

  IF (COUNT('SalesLT ProductCategory'[Product Category])=1, 
    VALUES('SalesLT ProductCategory'[Product Category]),
    
    IF (COUNT('SalesLT ProductCategory'[Product Category])=
              CALCULATE(COUNT('SalesLT ProductCategory'[Product Category]), 
                           All('SalesLT ProductCategory'[Product Category])),
        "All Categories Selected",
        
        MIN('SalesLT ProductCategory'[Product Category]) & " and the other "
                & COUNT('SalesLT ProductCategory'[Product Category])-1 & " categories"
    )
   )

Firstly, we check whether only a single item is selected, if so, we return the single selected value, if not, we check whether all items are selected, if so, we return the text showing all items are selected in the current filter context, if not, that means multiple items have been selected, we build a text string to show the name of the first item and the count of the rest of items.

Building Power BI Memory Usage Dashboard using DMV

As the VertiPad engine used in Power BI is an in-memory data analytical engine, the key to optimise your Power BI report performance is to reduce the memory usage of your data model. A smaller data model not only increase the data scan speed but also allow you to processing a larger dataset with the same hardware capability.

To support the investigation of memory usage in a Power BI data model, I have created a memory usage Power BI dashboard. The pbix file of this dashboard can be downloaded here.

1

Kasper de Jonge created a PowerPivot version of Memory Usage Report for tabular Analysis Service. If you are a PowerPivot user, you can find the version here.

To use this dashboard in your data model, you need to edit the two parameters in the Power BI report:

2

You can find how to get the ServerName and the Database form Chris Webb’s blog post that introduces how to analyse Power BI DMV Queries in Power BI Desktop.

Once you have connected the dashboard to your Power BI report, you can use the filters to find the memory usage of the tables and the columns in your data model, and also you can customise the dashboard as you want using the DMVModelMemoryUsage dataset.

The dashboard is able to help you identify the tables and columns with large memory footage. You can review those columns with heavy memory usage and see what you can do with them. For example, in the snapshot of the dashboard shown above, the Fact Sale table is the largest memory user. When we select this table, on the bar chart showing the memory usage of each column in the Fact Sale, we can see the two largest memory users are the Sale Key column and the WWI Invoice ID. After we reviewed our data model, we can see find both Sale Key and WWI Invoice ID are not used for the relationships with other table, and both columns will not be used for building the reports. Therefore, we can get rid of those key to reduce the memory usage.

Things better to do When Working with Power BI

This is the third part in the blog series summarising my experience with Power BI from the real-world projects.

  1. Create separate QA/UAT app workspace from Dev app workspace

Separate QA/UAT app workspace from Dev app workspace. Only app workspace admins have contribute permission with QA/UAT app workspaces. Normal report authors will not be able to modify the content in QA/Test app workspaces. Any change/bug fixing requested from QA/UAT will be feedback to report authors and will be implemented in Dev app workspace. The final on-prod Power BI app need to be published from QA/UAT app workspace.

  1. Prefer SSAS Tabular Model over SSAS Multidimensional Model if need live connection

For large-scale dataset, we may need to build data model outside of Power BI and to use Power BI for visualisation. Within Microsoft stack we have two options: SSAS Tabular model and SSAS Multidimensional model. If there is no specify restriction that limits you to use the multidimensional model, prefer to use tabular model for the following reasons:

  • SSAS Tabular Model and Power BI use the same analytics engine and support the same query language (DAX). NFUM BI developers can reuse their Power BI development skills for SSAS Tabular Model development, instead of learning other skills or seek external supports.
  • In-memory processing model used by SSAS Tabular Model is identified as the desired model by Microsoft and continued investment in the tabular model has made it a mature capability
  1. Using DirectQuery mode with cautions

The main reason to use DirectQuery mode is to reduce latency on data updates. This mode is recommended to use only when near-real time reporting is required, or the database is too large to fit in the memory of a single Analysis Services server (> 100 Billion rows). While DirectQuery mode is capable to reduce data update latency, this mode requires directly query from the same relational engine used to update the data with the data is not cached and organised in query efficient structure. Therefore, DirectQuery mode will increase the load on data sources, and can reduce performance of end user report rendering as reports in DirectQuery mode cannot take advantage of Power BI in-memory process engine.

  1. Source control Power BI pbix file and release control organisational Power BI app publishing

Keep the Power BI pbix file in source control repository. The OneDrive storage associated with a Power BI app workspace group (will be auto provisioned when create an app workspace) is a good choice as it is accessible for business users and do not require extra licenses. The release control policy, procedure, and management tool (can be built on the team site in app workspace group) need to be planned.

  1. Use ExpressRoute if possible

You can use Azure ExpressRoute to create private connections between your on-premises data sources and Power BI services through public peering (not Microsoft peering or private peering), instead of through public internet connection. ExpressRoute not only provides you with a network with better performance and reliability, but also ensure the better security and privacy of your data.

  1. On-premises gateway needs to be close to the data sources

The on-premises gateway needs to be deployed as close as possible to on-premises data sources to avoid data transport/network performance issues.