Tag: Power Query

Power Query – Parameterised Files Loading from a Year-Month-Day folder Hierarchy

In one of my previous blog post, I described an approach to load text files from Azure Data Lake Store to Power BI filtered by the date period specified by users through setting the period start and period end parameters. That approach is capable to load text files located at the same folder. However, there is another common folder structure pattern that is often used to organise the files in Azure Data Lake Store, i.e. the Year-Month-Day folder hierarchy, such as:

1t1 It raises a challenge when users want to load files between two dates into Power BI. Firstly, the Power Query query we created needs to be able to load files from different folders. In addition, the query should be able to conditionally load the files based on the data period specified by users through filtering the Year-Month-Day folder hierarchy. This blog post describes how to create Power Query queries for this type of requirements with the following steps.

Step 1 – Define the StartDate parameter and EndDate parameter for users to specify the date period
Step 2 – Define a custom function to load the metadata of all the files stored in a folder. The custom function takes folder path as argument and returns a table value which stores the file metadata
Step 3 – Generate the sequence of dates between the specified StartDate and EndDate
Step 4 – Derive the folder path from the generated date
Step 5 – Invoke the custom function created at step 2 and expand the returned table field to get the list of files we aim to extract
Step 6 – Combine the files and convert to table format

Step 1 – Define parameters

Firstly, we define two Date type of parameters, StartDate and EndDate.

stream-analytics-window-functions-sliding-intro

Step 2 – Define custom function

We then define a custom function that takes the path of a folder as argument and return a table value with the metadata of all files stored at that folder.

1t1

Step 3 – Generate date list

We then need to create the query to extract the files from the Year-Month-Day folder hierarchy filtered by the StartDate and EndDate parameters. Firstly, we generate the sequence of dates between the StartDate and the EndDate using built-in List.Dates function:

Step 4 – Derive the folder path

We then convert the dates list into a table and add the “DatePathPart” column that generate the Year/Month/Day part of the folder path such as “2008/06/09”. We then add the “FolderPath” column that makes the full folder path through concatenating the root folder path and the Year/Month/Day part.

After this step, we should have a table like this:

Step 5 – Invoke the custom function and expand returned table

We can then invoke the custom function defined at step 2 that will add a table type column. Once we expand the table, we will have the list of all the files we want to query based on the specified StartDate and EndDate parameters.

ml1

This is the full piece of code for this query:

	let

	#"Filtered Dates" = List.Dates(StartDate, Duration.Days(EndDate - StartDate)+1, #duration(1, 0, 0, 0)),

	#"Dates Table"= Table.FromList(#"Filtered Dates", Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
	#"Extract Date Path" = Table.AddColumn(#"Dates Table", "DatePathPart",each Text.From(Date.Year([Date])) & "/"
	& Text.PadStart(Text.From(Date.Month([Date])),2,"0") & "/"
	& Text.PadStart(Text.From(Date.Day([Date])),2,"0")),
	#"Add Root Path" = Table.AddColumn(#"Extract Date Path", "RootPath",
	each Text.From("adl://ninjagodem0123.azuredatalakestore.net/Samples/Demo/")),
	#"Add Folde Path" = Table.AddColumn(#"Add Root Path", "FolderPath", each [RootPath] & [DatePathPart]),

	#"Invoked Custom Function" = Table.AddColumn(#"Add Folde Path", "Files", each LoadFilesByFolder(Text.From([FolderPath]))),
	#"Expanded Files" = Table.ExpandTableColumn(#"Invoked Custom Function", "Files",
	{"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"},
	{"Files.Content", "Files.Name", "Files.Extension", "Files.Date accessed",
	"Files.Date modified", "Files.Date created", "Files.Attributes", "Files.Folder Path"})

	in
	#"Expanded Files"

view raw LoadFilesInDateFolderHirarchy.pq hosted with ❤ by GitHub

Step 6 – Combine the files

The last step is to combine the list of files we queried at step 1 – 5, extract the data records from the binary csv format and load into a single table for downstream process.

1t1

Power Query – Extract Multiple Tags Stored in a Single Text Field

Problem

It is not rare to see that multiple attributes are stored in a single text field especially for tagging enabled applications where an unfixed number of tags may associated with an article or post. Those tags are often stored in a single text filed with a delimiter to separate them. When reporting, we often need to category the articles or posts by the tags, e.g., counting the articles or posts by each tag.

1t1

To fulfil this reporting requirement, We need to reshape our dataset from something like:

1t1

to something like:

Solution

It is actually very easy to conduct this kind of transformation using Power Query with only three lines of M code.

line 1 – split the tag field into unfixed number of tag columns using Splitter.SplitTextByDelimiter function.

line 2 – use Table.UnpivotOtherColumns function to unpivot all the tag columns. As we don’t have a fixed number of tag columns, we need to use UnpivotOtherColumns function and specify the known columns (“PostID”, “Title” in this example) as arguments.

line 3 – remove the column to store the generated tag column names which will not be used in reporting

Power Query – Parameterised Files Loading from Azure Data Lake Store within a Given Date Range

Power BI now supports data load from Azure Data Lake Store. We can connect to a folder in the Azure Data Lake Store and load all files from that folder.
1t1

However, we often don’t want to or aren’t able to load all the files in the Azure Data Lake Store folder into Power BI due to the volume of the data. Instead, we want to be able to specify a data range and only load the files fallen into that data range.

The Azure Data Lake Store connector in Power BI doesn’t provide direct support for conditional data loading based on a given criteria. However, with some help from the M language, we can easily implement this feature through customising the query scripts.

Firstly, we create two Power BI parameters, the StartDate (for the start data of the given data range) and EndDate (for the end data of the give data range).

stream-analytics-window-functions-sliding-intro

When we connect the Power BI to the Azure Data Lake Store folder, the DataLake.Contents(“{ADLS folder path}”) will return the metadata of the list of files stored in that folder. Then we can use Table.SelectRows function to go through each files, extract the date string from the name and convert it to date type, and then check whether the date falls into the give data range though comparing the date to the StartDate parameter and EndDate parameter.

A prerequisite for this solution to work is that the date info needs to be included in the file name. It is the common practise (and also good practise for performance reason) to partition the files stored in Azure Data Lake Store by date.

Now we have only loaded the files as we needed into the Power BI and we can combine those files into a single table for the downstream visualisation in Power BI. As the snapshot below showing, we can open the “Combine Files” dialog to combine the files.

After the files are combined and loaded into Power BI dataset, we can build Power BI visualisations using the data in the dataset. In future, when we need to load files within other data ranges, we don’t have to edit the query again, instead, we just need to set the StartDate and the EndDate parameters and the dataset will be automatically refreshed with data in the new given data range.

Inspecting SharePoint Metadata using Power Query

Power Query is a self-service ETL tool from Microsoft which is able to extract data from a broad range of data sources, transform the data, and finally load the data to PowerPivot. However, I have recently found that Power Query could also be a very handy SharePoint Metadata viewer to inspect the schema of SharePoint sites, lists, content types… What we need is to make use of the Power Query OData connection to SharePoint Rest API and use the relation drill down feature to navigate through the SharePoint objects.

For example, we can connect to a SharePoint site (“https://linxiaodev.sharepoint.com/sites/POC/_api/web”) using the OData connection, and the Power Query will load all attributes of the site and also the links to drill down to the related Record or Table.

For example, we can drill down to the Lists table which load the metadata of all the lists in the site. One tip here is that you can select which columns of attributes to show instead of show a long list of all columns.

Then, you can further drill down to the Table or Record related to a list, e.g., the list of related Fields of the ‘Documents’ library list. In one of my previous blog post, I need to inspect the Hidden attribute of the LikedBy field and the tool I was using is SharePoint Manager, and now Power Query can be another option.

Step 1 – Define parameters

Step 2 – Define custom function

Step 3 – Generate date list

Step 6 – Combine the files

Share this:

Problem

Solution

Share this:

Share this:

Share this: