Tag: M

Power Query – Parameterised Files Loading from a Year-Month-Day folder Hierarchy

In one of my previous blog post, I described an approach to load text files from Azure Data Lake Store to Power BI filtered by the date period specified by users through setting the period start and period end parameters. That approach is capable to load text files located at the same folder. However, there is another common folder structure pattern that is often used to organise the files in Azure Data Lake Store, i.e. the Year-Month-Day folder hierarchy, such as:

1t1 It raises a challenge when users want to load files between two dates into Power BI. Firstly, the Power Query query we created needs to be able to load files from different folders. In addition, the query should be able to conditionally load the files based on the data period specified by users through filtering the Year-Month-Day folder hierarchy. This blog post describes how to create Power Query queries for this type of requirements with the following steps.

  • Step 1 – Define the StartDate parameter and EndDate parameter for users to specify the date period
  • Step 2 – Define a custom function to load the metadata of all the files stored in a folder. The custom function takes folder path as argument and returns a table value which stores the file metadata
  • Step 3 – Generate the sequence of dates between the specified StartDate and EndDate
  • Step 4 – Derive the folder path from the generated date
  • Step 5 – Invoke the custom function created at step 2 and expand the returned table field to get the list of files we aim to extract
  • Step 6 – Combine the files and convert to table format

Step 1 – Define parameters

Firstly, we define two Date type of parameters, StartDate and EndDate.

stream-analytics-window-functions-sliding-intro

Step 2 – Define custom function

We then define a custom function that takes the path of a folder as argument and return a table value with the metadata of all files stored at that folder.

1t1

2

Step 3 – Generate date list

We then need to create the query to extract the files from the Year-Month-Day folder hierarchy filtered by the StartDate and EndDate parameters. Firstly, we generate the sequence of dates between the StartDate and the EndDate using built-in List.Dates function:

4

Step 4 – Derive the folder path

5

We then convert the dates list into a table and add the “DatePathPart” column that generate the Year/Month/Day part of the folder path such as “2008/06/09”. We then add the “FolderPath” column that makes the full folder path through concatenating the root folder path and the Year/Month/Day part.

After this step, we should have a table like this:

7

Step 5 – Invoke the custom function and expand returned table

6

We can then invoke the custom function defined at step 2 that will add a table type column. Once we expand the table, we will have the list of all the files we want to query based on the specified StartDate and EndDate parameters.

ml1

This is the full piece of code for this query:

let
#"Filtered Dates" = List.Dates(StartDate, Duration.Days(EndDate - StartDate)+1, #duration(1, 0, 0, 0)),
#"Dates Table"= Table.FromList(#"Filtered Dates", Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
#"Extract Date Path" = Table.AddColumn(#"Dates Table", "DatePathPart",each Text.From(Date.Year([Date])) & "/"
& Text.PadStart(Text.From(Date.Month([Date])),2,"0") & "/"
& Text.PadStart(Text.From(Date.Day([Date])),2,"0")),
#"Add Root Path" = Table.AddColumn(#"Extract Date Path", "RootPath",
each Text.From("adl://ninjagodem0123.azuredatalakestore.net/Samples/Demo/")),
#"Add Folde Path" = Table.AddColumn(#"Add Root Path", "FolderPath", each [RootPath] & [DatePathPart]),
#"Invoked Custom Function" = Table.AddColumn(#"Add Folde Path", "Files", each LoadFilesByFolder(Text.From([FolderPath]))),
#"Expanded Files" = Table.ExpandTableColumn(#"Invoked Custom Function", "Files",
{"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"},
{"Files.Content", "Files.Name", "Files.Extension", "Files.Date accessed",
"Files.Date modified", "Files.Date created", "Files.Attributes", "Files.Folder Path"})
in
#"Expanded Files"

Step 6 – Combine the files

The last step is to combine the list of files we queried at step 1 – 5, extract the data records from the binary csv format and load into a single table for downstream process.

1t1

2

Power Query – Parameterised Files Loading from Azure Data Lake Store within a Given Date Range

Power BI now supports data load from Azure Data Lake Store. We can connect to a folder in the Azure Data Lake Store and load all files from that folder.
1t1

3

However, we often don’t want to or aren’t able to load all the files in the Azure Data Lake Store folder into Power BI due to the volume of the data. Instead, we want to be able to specify a data range and only load the files fallen into that data range.

The Azure Data Lake Store connector in Power BI doesn’t provide direct support for conditional data loading based on a given criteria. However, with some help from the M language, we can easily implement this feature through customising the query scripts.

Firstly, we create two Power BI parameters, the StartDate (for the start data of the given data range) and EndDate (for the end data of the give data range).

stream-analytics-window-functions-sliding-intro

When we connect the Power BI to the Azure Data Lake Store folder, the DataLake.Contents(“{ADLS folder path}”) will return the metadata of the list of files stored in that folder. Then we can use Table.SelectRows function to go through each files, extract the date string from the name and convert it to date type, and then check whether the date falls into the give data range though comparing the date to the StartDate parameter and EndDate parameter.

5

A prerequisite for this solution to work is that the date info needs to be included in the file name. It is the common practise (and also good practise for performance reason) to partition the files stored in Azure Data Lake Store by date.

Now we have only loaded the files as we needed into the Power BI and we can combine those files into a single table for the downstream visualisation in Power BI. As the snapshot below showing, we can open the “Combine Files” dialog to combine the files.

6

7

After the files are combined and loaded into Power BI dataset, we can build Power BI visualisations using the data in the dataset. In future, when we need to load files within other data ranges, we don’t have to edit the query again, instead, we just need to set the StartDate and the EndDate parameters and the dataset will be automatically refreshed with data in the new given data range.

5

4