Power Query – Extract Multiple Tags Stored in a Single Text Field

Problem

It is not rare to see that multiple attributes are stored in a single text field especially for tagging enabled applications where an unfixed number of tags may associated with an article or post. Those tags are often stored in a single text filed with a delimiter to separate them. When reporting, we often need to category the articles or posts by the tags, e.g., counting the articles or posts by each tag.

1t1

To fulfil this reporting requirement, We need to reshape our dataset from something like:

1t1

to something like:

2

Solution

It is actually very easy to conduct this kind of transformation using Power Query with only three lines of M code.

3

line 1 – split the tag field into unfixed number of tag columns using Splitter.SplitTextByDelimiter function.

line 2 – use Table.UnpivotOtherColumns function to unpivot all the tag columns. As we don’t have a fixed number of tag columns, we need to use UnpivotOtherColumns function and specify the known columns (“PostID”, “Title” in this example) as arguments.

line 3 – remove the column to store the generated tag column names which will not be used in reporting

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s