Tag: R

R Visual – Create Gartner Magic Quadrant-Like Charts in Power BI using ggplot2

In this blog post, I am going to create a R visual that renders the Gartner magic quadrant-like charts in Power BI using the ggplot2 package.

2.PNG

A dummy dataset will be created, including three columns, the “Company” column holding the name of the companies which will be ranked in the quadrant chart, the “ExcutionScore” column and the “VisionScore” column corresponding to the “Ability to Execute” metric and the “Completeness of Vision” metric in the Gartner magic quadrant assessment. In the dummy dataset, the “ExcutionScore” and the “VisionScore” are scale from 0 to 100.

3

We drag a R visual onto Power BI editor canvas and add the three columns from the dummy dataset. We can bind the RStudio IDE to Power BI and use it to author and test the R scripts.

In the R script editor, we first reference the “ggplot2” library and the “grid” library. The “grid” library is used to draw custom annotations outside of the main ggplot2 panel.

library(ggplot2)
library(grid)

We then create a ggplot2 object using the dataset referenced in the R visual, assigning the “VisionScore” value to x-axis and assigning the “ExcutionScore” value to y-axis.

p <- ggplot(dataset, aes(VisionScore, ExcutionScore))
p <- p + scale_x_continuous(expand = c(0, 0), limits = c(0, 100)) 
p <- p + scale_y_continuous(expand = c(0, 0), limits = c(0, 100))

4

We now have our base panel and we can start our journey to build the Gartner Magic Quadrant-Like chart.

First of all, we set the x-axis label as “COMPLETEMENT OF VISION” and set the y-axis label as “ABILITY TO EXECUTE” and make them aligned to left-side. We then remove the axis ticks and text from the plot. We will also add a title to the top of the plot.

p <- p + labs(x="COMPLETEMENT OF VISION",y="ABILITY TO EXECUTE")
p <- p + theme(axis.title.x = element_text(hjust = 0, vjust=4, colour="darkgrey",size=10,face="bold"))
p <- p + theme(axis.title.y = element_text(hjust = 0, vjust=0, colour="darkgrey",size=10,face="bold"))

p <- p + theme(
          axis.ticks.x=element_blank(), 
          axis.text.x=element_blank(),
          axis.ticks.y=element_blank(),
          axis.text.y=element_blank()
        )

p <- p + ggtitle("Gartner Magic Quadrant - Created for Power BI using ggpolt2") 

Those steps will progress our chart to somewhere like:

5

We then add four rectangle type of annotations to fill the four quadrant areas using the Gartner magic quadrant scheme. We also need to create a border and split lines for the quadrant chart.

p <- p +
      annotate("rect", xmin = 50, xmax = 100, ymin = 50, ymax = 100, fill= "#F8F9F9")  + 
      annotate("rect", xmin = 0, xmax = 50, ymin = 0, ymax = 50 , fill= "#F8F9F9") + 
      annotate("rect", xmin = 50, xmax = 100, ymin = 0, ymax = 50, fill= "white") + 
      annotate("rect", xmin = 0, xmax = 50, ymin = 50, ymax = 100, fill= "white")

p <- p + theme(panel.border = element_rect(colour = "lightgrey", fill=NA, size=4))
p <- p + geom_hline(yintercept=50, color = "lightgrey", size=1.5)
p <- p + geom_vline(xintercept=50, color = "lightgrey", size=1.5)

6

We also need to add a label to each quadrant area:

p <- p + geom_label(aes(x = 25, y = 97, label = "CALLENGERS"), 
                    label.padding = unit(2, "mm"),  fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 75, y = 97, label = "LEADERS"), 
                    label.padding = unit(2, "mm"), fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 25, y = 3, label = "NICHE PLAYERS"), 
                    label.padding = unit(2, "mm"),  fill = "lightgrey", color="white")
p <- p + geom_label(aes(x = 75, y = 3, label = "VISIONARIES"), 
                    label.padding = unit(2, "mm"), fill = "lightgrey", color="white")

7

Up to this point, our chart starts to look like the Gartner magic quadrant. Next, we need to draw the company points to the chart with the position corresponding to their “Ability to Execute” value and “Completeness of Vision” value.

p <- p + geom_point(colour = "#2896BA", size = 5) 
p <- p  + geom_text(aes(label=Company),colour="#2896BA", hjust=-0.3, vjust=0.25, size=3.2)

8

Our quadrant chart is nearly done, just one part missing, the arrows next to the “Ability to Execute” and “Completeness of Vision” text labels.

10.PNG

As the arrows need to be located outside of the main panel, we need to create custom annotation (annotation_custom) with linesGrob to draw a straight line with an arrow at the far end of the line. To make the arrows to visible outside of the main panel, we need to turn off the clip attribute of the main panel.

p <- p + annotation_custom(
            grob = linesGrob(arrow=arrow(type="open", ends="last", length=unit(2,"mm")), 
                   gp=gpar(col="lightgrey", lwd=4)), 
            xmin = -2, xmax = -2, ymin = 25, ymax = 40
          )
p <- p + annotation_custom(
  grob = linesGrob(arrow=arrow(type="open", ends="last", length=unit(2,"mm")), 
                   gp=gpar(col="lightgrey", lwd=4)), 
  xmin = 28, xmax = 43, ymin = -3, ymax = -3
)

gt = ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name=="panel"] = "off"
grid.draw(gt)

We now have our completed quadrant chart.

9

You can find the complete source code here:

Please find the pbix file here.

Advertisements
R Visual – from Grid-Facet to Geo-Facet in Power BI

R Visual – from Grid-Facet to Geo-Facet in Power BI

In one of my previous blog post, I used the facet_wrap function in ggplot2 package to build a grid facet to display the rank history of each Eurovision competition country.

1t1

The grid facet looks pretty neat as all sub-panels are perfectly aligned, however, it fails to display the geospatial information of the countries that may reveal some useful insights. For example, in my last blog post , I built a voting network chart of Eurovision competition that has revealed the mutual high voting scores between some neighbour countries.

There is a R package, namely geofacet, which comes with a list of pre-built geospatial grids for a number of geographical areas, countries and states. One of the pre-built grids is for Europe area which is perfect for our Eurovision example.

It is very straightforward to use the geofacet package. After referenced the package in our R script, all we need to do is to replace the facet_wrap function in our ggplot2 code with the facet_geo function provided by the geofacet package. We need to specify the column by which the facet is divided and the name of the pre-built grid we will use. In this example, we use “eu_grid1” which is the grid for Europe area.

b3

Now we have done all the work to convert our standard grid facet to geospatial facet. You can download the pbix file here.

b2

Apart from the Europe area grid, you can find a list of other pre-built grids here. Considering where I am living at this moment, another pre-built grid I am particularly interested at is the London Borough grid. This is a geo-facet chart I have created to visualise the unemployment rate in the London boroughs.

b1 You can also create your own grid which is literally a data frame with four columns, name and code columns that map to the facet label column in the dataset, and the row and col columns that specify the grid locations.

This is a test grid I have created to demonstrate how to create custom grid:

customGrid <- data.frame(
  name = c("Enfield", "Haringey", "Islington", "Hackney", "Camden", "Hackeny", "Redbridge", "Brent", "Ealing"),
  code = c("Enfield", "Haringey", "Islington", "Hackney", "Camden", "Hackeny", "Redbridge", "Brent", "Ealing"),
  row = c(1, 2, 3, 3, 3, 3, 3, 4, 5),
  col = c(3, 3, 5, 4, 1, 2, 3, 3, 3),
  stringsAsFactors = FALSE
)

b4

R Visual – Build Eurovision Voting Network Chart in Power BI

I have been watching Eurovision competitions for several years. I personally think the voting results from Eurovision competitions can be a very good source for the research of relationships between European countries. In this blog post, I will create a social network R visual using iGraph package and use the visual to analyse the voting network of Eurovison competitions.

1

Firstly, we need to prepare our raw Eurovision dataset into the following format that contains three columns, “From country” (where is the vote from), “To Country” (where is the vote to), and “Avg Point” (that computes the average points the “From country” has given to the “To Country” over the years. You can prepare the data either using DAX (creating the “AvgPoint” measure) or Power Query (group by “From country”+”To Country” and calculate average points).

a3

We add a R visual to the Power BI canvas and add the three columns to the R visual. If you prefer to use other R IDE (e.g., RStudio) to edit the R scripts, you can bind your IDE to Power BI.

We will use the igraph R package to render the voting network chart. Firstly, we need to load igraph library and then create a igraph data frame from the dataset specified on the Power BI R visual. We will use the plot function to render the network chart with the style attribute settings of vertex, edge etc.

# user igraph library
library(igraph)

# create a igraph data frame from the dataset specified on the Power BI R visual
df.g <- graph.data.frame(d = dataset, directed = TRUE)

# define colors
comps <- components(df.g)$membership
colbar <- rainbow(max(comps)+1)
V(df.g)$color <- colbar[comps+1]

# render the network chart and set the style attributes of vertex, edge etc.
plot(df.g, 
     vertex.label = V(df.g)$name,
     layout=layout_with_fr, 
     vertex.size=12,
     vertex.label.dist=0, 
     vertex.label.color= "darkblue",
     vertex.shape = "circle",
     vertex.label.cex = 1,
     vertex.label.font = 2,
     edge.arrow.size=0.5,
     edge.curved=T,
     margin =-0.05
 )

a4

After we authored and tested the R script in RStudio, we can now add the script to the R visual in Power BI that will be able to interact with other visuals on the same page.

Before we set any threshold on the average voting points, all voting paths between the countries will be draw on the network chart that makes the chart unreadable.

a2

However, when we set a higher average voting points as threshold which only shows the voting path over the threshold, we can find some relationship patterns.

1

For example, we can see the mutual high votes between neighbour countries like Spain <-> Andorra, Roumania <-> Moldova, and Greece<->Cyprus.

11.PNG

Please find the pbix file here.

 

R Visual – Building Facet Grid in Power BI

Since Power BI started to support R visual, it has become difficult to criticise Power BI’s visualisation capability because we can now take full advantage of R’s powerful visualisation packages such as ggplot2 to create Power BI reports. Unlike creating Power BI custom visual which is a rather time-consuming task, we can create eye-catching charts in just a few of lines with R visual.

Facet grid is a popular chart type but is not supported by Power BI yet. However, we can easily build a facet grid chart with the help of ggplot2 package.

1t1

Firstly, we need to get our data into the right format. In this example, we use the Eurovision competition dataset which contains the voting records between 1975 to 2016.

5

We need to calculate the rank of each country for each year based on the points they received from the rest of countries. We can use the DAX RANKX function to calculate the rank measure and get the results like:

2

Now we are ready to create our facet grid visual. We add a R visual to the Power BI canvas and add three columns, Year, ToCountry, Rank, to the visual.

7.PNG

3

On the R script editor, we first reference the ggplot2 library and then create a ggplot object placing Year on x-axis and Rank on y-axis. The key step is to add facet_wrap(~ToCountry) that generates the facet grid by voting destination country (ToCountry column).

4

Please download the pbix file here.

R Visual – Building Component Cycle Timeline

One common approach to detect exceptions of a machine is to monitor the correlative status of components in the machine. For example, in normal condition, two or more components should be running at the same time, or some components should be running in sequential order. When the components are not running in the way as expected, that indicates potential issues with the machine which need attentions from engineers.

Thanks to IoT sensors, it is easy to capture the data of component status in a machine. To help engineers to easily pick up the potential issues, a machine components timeline chart will be very helpful. However, Power BI does not provide this kind of timeline chart, and it can be time consuming to build custom Javascript-based visuals. Fortunately, we have R visual in Power BI. With a little help from ggplot2 library, we can easily build a timeline chart (as the one shown below) in four lines code.

f1

Firstly, we need add a R visual onto Power BI canvas, and set the data fields required for the chart.

f2

The minimal fields required are the name/code of the component, start date and end date of the component running cycle, such as:

ComponentCode Start DateTime End DateTime
Pump 2017-01-01 09:12:35 2017-01-01 09:18:37
Motor 2017-01-01 09:12:35 2017-01-01 09:18:37

In the R script editor, we use the geom_segment to draw each component running cycles. The x axis is for the time of component running, and the y axis is for the component name/code.

f3

Apart from showing a Power BI timeline chart for engineers to detect the machine issue as this blog post introduced, we can also send alerts to engineers using Azure Functions.

Extracting Features from IoT Sensor Data using R

In my previous blog I introduced the common patterns to extract features from IoT sensor data using Python. Although R is not my primary machine learning language it is becoming ubiquitous in Microsoft’s data analytics ecosystem after they acquired Revolution Analytics, the major commercial distributor of R. Considering the increasing popularity of R on Microsoft data platforms, I will create the R version of code for IoT data feature extraction in this blog.

This blog post is also organised based on the three common patterns for extracting feature from IoT sensor data:

  • Window-based descriptive statistics
  • Seasonal pattern
  • Trend pattern

Also, the examples use the same IoT sample data that stores the hourly reading from sensor A.

a1

  1. Window-based descriptive statistics

We can use the rollapply function in the zoo library to calculate the descriptive statistics values in a rolling window. As there is no function for Skewness in the core R packages we have to use the e1071 library that contains the Skewness and Kurtosis function.

data <- data %>%
        mutate(SensorA_Mean_12h=rollapply(SensorA, width=12, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_12h=rollapply(SensorA, width=12, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_12h=rollapply(SensorA, width=12, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_24h=rollapply(SensorA, width=24, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_24h=rollapply(SensorA, width=24, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_24h=rollapply(SensorA, width=24, FUN=skewness, by=1, fill=NA, align='right'),
               SensorA_Mean_72h=rollapply(SensorA, width=72, FUN=mean, by=1, fill=NA, align='right'),
               SensorA_SD_72h=rollapply(SensorA, width=72, FUN=sd, by=1, fill=NA, align='right'),
               SensorA_Skew_72h=rollapply(SensorA, width=72, FUN=skewness, by=1, fill=NA, align='right')
               ) 
tail(data, 5)

The code above will generate the following features:

a1

  1. Seasonal pattern

A date + time is represented in R as an object of class POSIXct. Once we convert the DateTime column into POSIXct, we can easily extract the parts of the datatime.

data$Date <- as.POSIXct(data$Date, "%Y-%m-%dT%H:%M:%S", tz="UTC")

data$DayOfWeek <- as.numeric(format(data$Date, "%u"))
data$IsWeekend <- ifelse (data$DayOfWeek>5, 1, 0)
data$Hour <- as.numeric(format(data$Date, "%H"))
data$IsWorkingHour <- ifelse (data$Hour>=9 & data$Hour<=17, 1, 0)
data$Year <- as.numeric(format(data$Date, "%Y"))
data$Month <- as.numeric(format(data$Date, "%m"))
data$DayOfMonth <- as.numeric(format(data$Date, "%d"))
tail(data, 5)

a2

  1. Trend pattern

In Python, we can use shift function to extract the features for representing the trend pattern in a time-series dataset. In R, a similar function is slide provided by DataCombine library.

data <- slide(data, Var = "SensorA", slideBy = -1:-7, 
      NewVar=c('SensorA_lag_1h', 'SensorA_lag_2h', 'SensorA_lag_3h', 'SensorA_lag_4h',
               'SensorA_lag_5h', 'SensorA_lag_6h', 'SensorA_lag_7h')
               )                                 
tail(data, 5)

We can the output as:
a3