Split Column by Digit to Non-digit

Power BI MCSA Certification Training Tips – Part 10: Split Columns by Digit to Non-digit

To obtain your MCSA: BI Reporting you need to pass two exams: 70-778 (Analyzing and Visualizing Data with Power BI) and 70-779 (Analyzing and Visualizing Data with Excel). However, there is a huge overlap between the two exams resulting from the common BI features found in both Power BI and Excel: namely data modelling using the DAX language; and data connection and transformation using the M language.

We, at G Com Solutions, recommend to the clients to whom we supply Power BI Training that they prepare for both exams simultaneously and then sit both exams on the same day. We also run a 5 day Power BI MCSA Certification Prep course in London which includes both exams 70-778 and 70-779.

In this series of blog posts, we will be examining some of the key features of Power BI, Power Pivot and Power Query which you will need to master in order to pass the two exams.

This topic, which applies equally to exam 70-778 and 70-779, relates to the splitting of columns which contain more than a single piece of information which need to be split into their own columns.

The Split Columns command in the Query Editor is very similar to Microsoft Excel’s Text to Columns feature. It allows you to split a column using a delimiter or by a set number of characters.

There are now a number of different options as to how the split will take place. The classic methods (which mirror the options available inExcel’s Text to Columns command) are by Delimiter and by Number of Characters. However, in addition, we now have a number of additional options including the very useful Split Column by Digit to Non-digit and Split Column by Non-digit to Digit. These options are useful where you have multiple items of data in the same column with no delimiters and where the number of characters in each of the nested data items is inconsistent.

Consider the following example, where the first thing we can do is to promote the header “Customer Ref”: Transform > Use First Row as Headers.

If we examine the Customer Code, we can see it consists of three parts: first we have a customer ID; then we have a two-letter country code; and, finally, a four-character business sector code.

The problem here is that the number of characters in the first component varies: it can be five, six or seven digits. This means that we cannot use the Split by Number of Characters option. However, we can use the command Split Column by Digit to Non-Digit (available in the Home Tab of the Ribbon as well as in the context menu which appears when you right-click on the column heading).

We can then use the reverse command (Split Column by Non-digit to Digit), on the second column, to split the two remaining nested data elements: the country code and the business sector code.

The Split Column by Digit to Non-digit and Split Column by Non-digit to Digit commands produce steps called Split Column by Character Transition.

The final step is to rename the columns: “Customer Code”, “Country Code” and “Sector Code”.

Having extracted Country Code and Sector code into separate columns, we could then go on to use them to create relationships with our Countries and Sectors tables as shown below.