Power Query – Parameterised Files Loading from Azure Data Lake Store within a Given Date Range

Power BI now supports data load from Azure Data Lake Store. We can connect to a folder in the Azure Data Lake Store and load all files from that folder.
1t1

3

However, we often don’t want to or aren’t able to load all the files in the Azure Data Lake Store folder into Power BI due to the volume of the data. Instead, we want to be able to specify a data range and only load the files fallen into that data range.

The Azure Data Lake Store connector in Power BI doesn’t provide direct support for conditional data loading based on a given criteria. However, with some help from the M language, we can easily implement this feature through customising the query scripts.

Firstly, we create two Power BI parameters, the StartDate (for the start data of the given data range) and EndDate (for the end data of the give data range).

stream-analytics-window-functions-sliding-intro

When we connect the Power BI to the Azure Data Lake Store folder, the DataLake.Contents(“{ADLS folder path}”) will return the metadata of the list of files stored in that folder. Then we can use Table.SelectRows function to go through each files, extract the date string from the name and convert it to date type, and then check whether the date falls into the give data range though comparing the date to the StartDate parameter and EndDate parameter.

5

A prerequisite for this solution to work is that the date info needs to be included in the file name. It is the common practise (and also good practise for performance reason) to partition the files stored in Azure Data Lake Store by date.

Now we have only loaded the files as we needed into the Power BI and we can combine those files into a single table for the downstream visualisation in Power BI. As the snapshot below showing, we can open the “Combine Files” dialog to combine the files.

6

7

After the files are combined and loaded into Power BI dataset, we can build Power BI visualisations using the data in the dataset. In future, when we need to load files within other data ranges, we don’t have to edit the query again, instead, we just need to set the StartDate and the EndDate parameters and the dataset will be automatically refreshed with data in the new given data range.

5

4

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s