Unpivot Nested Headings With Power Query

If you are a data analyst, you undoubtedly know and love Excel, and undoubtedly know and love pivot tables, they must be a core part of your analysis toolbox. No doubt at some time you will have faced the need to pivot some data that arrives on your desktop in a cross-tab format, thereby needing to be flattened before you can do meaningful analysis on that data. There have been various ways to achieve this in the past, but with the introduction of Power BI,  Power Query has a very useful facility to unpivot such a table. There are many articles scattered around the web, such as this one from Ken Puls and another by Mike Alexander, that show how to use Power Query to unpivot data, so I will not  discuss it any further here.

All of these articles show how to unpivot a ‘normal’ cross-tab report, such as the sales table shown in the image below.

Standard Table

But there is a problem should the horizontal groupings have two levels, such as Country within Region, as shown in the next image.

Multiple Headings Table

This is somewhat trickier. If you load and unpivot this table into Power Query, you get the following results.

Multiple table unpivotted

This is not too useful. What is actually wanted is a table that would look like this.

Final Results

Even though that table from Power Query can be tidied with some judicious filtering and column renaming, there is not much else that can be done with it.

So how can we unpivot such a table? It can be done, but we need to build our query in a number of steps, using the power of Power Query’s expression language.

I first came across this problem when Ingeborg Hawighorst, a fellow Excel MVP, mentioned that she was looking to do a presentation at ComCamp in Auckland in March, and had downloaded some stats on birth rates in New Zealand to use in this presentation. Unfortunately, this data has two levels of headings similar to the table above; Ingeborg needed to create a dataset  that could be used for further analysis with Power Pivot, Power View, and/or Power Map.

In the solution presented here, I will use this same dataset. The data is grouped by age bands within regions of New Zealand, and is listed by year. I will limit myself to unpivotting the data and loading it into the datamodel –  if you want to see it presented in Power View and Power Map, you’ll need to go to ComCamp.

The data is shown in the image below.

NZ Birth Rate stats

The birth rate data is stored in an Excel file, so the first step is to import the table into Power Query using the From File>From Excel button. The M expressions generated are:

Source = Excel.Workbook(File.Contents("C:\Users\Bob\Documents\My Blogs\Unpivot Nested Headings\NZ Birth Rates.xlsx")),
LoadData = Source{[Name="NZ Birth Rates"]},

Name is the table in the workbook to be loaded, and you will need to adjust the file path accordingly.

The second step is to transpose the table, move the regions and countries to row headings, and years to column headings. Here the M is:

TransposeTable = Table.Transpose(LoadData),

A bit of data cleansing is done next, remove the first row and rename some columns:

PromoteHeaders = Table.PromoteHeaders(TransposeTable),
RenameDimensions = Table.RenameColumns(PromoteHeaders , {{"Column1", "Region"}, {" ", "Age Band"}}),

The next step is where it really gets clever. We need to duplicate the region name for each of the age bands so that we can use it in Power Pivot. This is achieved using the FillDown method

FillDownRegion = Table.FillDown(RenameDimensions, {"Region"}),

After we have done this, the data is starting to take shape, as can be seen in the following image.

Multiple headings unpivootted

We now have data that we can unpivot. We want to unpivot all of the columns except Region and Ages. Again, M has a method for this:

UnpivotValues = Table.UnpivotOtherColumns(FillDownRegion, {"Region", "Age Band"}, "Attribute", "Value"),

The data is looking good at this point, but as you can see in the following image, it has a few rows that we don’t want, total rows that pivot tables will calculate for us.

Unpivotted data

All that is left to do some more data cleansing, filter out unwanted rows, and rename the year column:

FilterOutCountry = Table.SelectRows(UnpivotValues, each ([Region]<>"Total New Zealand")),
FilterOutAllAges = Table.SelectRows(FilterOutCountry, each ([Age Band]<>"Total all ages")),
RenameYearColumn = Table.RenameColumns(FilterOutAllAges, {{"Attribute", "Year"}})

We can now load this transformed data into Excel and use it within PowerPivot.

I added a couple of Excel tables to my datamodel to provide a richer pivot. The first was a Region table, which has the island that the region is within (for a slicer), as well as area and population, and an Age Band table so that the bands can be sorted in order/. The two tables are shown below.

Region table

Age band table

That about wraps it up. As I mentioned above, the presentation can be greatly enhanced as Ingeborg will be showing, but this post is about the Power Query, so we will stop here. The enhanced presentation should be great though, and I am sure that ComCamp will be a great event, so if you are in Auckland in late March, try to get there. I’m going later in the year, but about 8-9 months too late.

I should also give substantial credit to Rory Archibald, a UK Excel MVP, who took my first, very klunky approach and refined it into a more elegant, more focussed solution that properly exploited the power of M. I have developed that further into the solution presented; you are seeing the results of our collaboration.

The full M listing for this query is:

let
    Source = Excel.Workbook(File.Contents("C:\Users\Bob\Documents\My Blogs\Unpivot Nested Headings\NZ Birth Rates.xlsx")),
    LoadData = Source{[Name="NZ Birth Rates"]},
    TransposeTable = Table.Transpose(LoadData),
    PromoteHeaders = Table.PromoteHeaders(TransposeTable),
    RenameDimensions = Table.RenameColumns(PromoteHeaders , {{"Column1", "Region"}, {" ", "Age Band"}}),
    FillDownRegion = Table.FillDown(RenameDimensions, {"Region"}),
    UnpivotValues = Table.UnpivotOtherColumns(FillDownRegion, {"Region", "Age Band"}, "Attribute", "Value"),
    FilterOutCountry = Table.SelectRows(UnpivotValues, each ([Region]<>"Total New Zealand")),
    FilterOutAllAges = Table.SelectRows(FilterOutCountry, each ([Age Band]<>"Total all ages")),
    RenameYearColumn = Table.RenameColumns(FilterOutAllAges, {{"Attribute", "Year"}})
in
    RenameYearColumn

You can download the stats data file here, and the full query and presentation workbook here.

Advertisements
Posted in Power BI, Power Query | Tagged , , , , , , | 9 Comments

A CUBEMEMBERPROPERTY Equivalent With PowerPivot

I was recently asked how to use the CUBEMEMBERPROPERTY function with a Power Pivot model, to get the Customer Name with a formula by using a unique code such as the Customer Code. I can’t say that I use this function much, but I thought I could recall using it at some point. However, a little testing made me wonder if I had ever used it as I could not get it to work. A quick Google suggested that other people had the same problem,  member properties cannot be defined in the model so maybe it can’t work.

I sat down and tried to figure out another way to emulate this, and this is what this post describes.

I am using the AdventureWorksDWR2008 database to demonstrate this technique, and in concert with the question originally asked I will get a value the Customer table such as email address or gender, given a known customer key. I have also added a customer column in that table with a simple formula of
=IF([MiddleName]=””,[FirstName]&” “&[LastName],[FirstName]&” “&[MiddleName]&” “&[LastName])

You are probably aware that you can easily get a CUBEMEMBER with a formula such as
=CUBEMEMBER(“PowerPivot Data”,”[ProductCategory].[Product Category Name].[Bikes]”)
This is a bit restrictive as you have to know the item to retrieve, it can be done more dynamically using CUBESET and CUBERANKEDMEMBER, but the point of showing this is to show that CUBEMEMBER will do what we want, if only we could put some logic in the expression that determines the member using another value held in a ell on the worksheet.

And we can. All (sic!) we have to do is included a bit more MDX, using the EXISTS function. The syntax of EXISTS on Technet is
Exists( Set_Expression1 , Set_Expression2 [, MeasureGroupName] )
and is described as …

Returning the set of tuples of the first set specified that exist with one or more tuples of the second set specified. If the optional <Measure Group Name> is provided, the function returns tuples that exist with one or more tuples from the second set and those tuples that have associated rows in the fact table of the specified measure group.

The MeasureGroupName would be useful if we were looking only for customers with sales, but we don’t care whether the customer has sales or not, so we can omit that optional parameter.

The first set expression will define the set we want returned, the second will define the variable condition that we are applying.

Assuming that we have our customer key in D17, we can use EXISTS within a CUBEMEMBER function to get the customer name,
=CUBEMEMBER(“PowerPivot Data”,”EXISTS([Customer].[Customer].Children,[Customer].[CustomerKey].[“&D17&”])”)
or the customer’s email address
=CUBEMEMBER(“PowerPivot Data”,”EXISTS([Customer].[EmailAddress].Children,[Customer].[CustomerKey].[“&D17&”])”)

To make it a little tidier, in case we have a wrong customer key, we can wrap it in IFERROR
=IFERROR(CUBEMEMBER(“PowerPivot Data”,”EXISTS([Customer].[Customer].Children,[Customer].[CustomerKey].[“&D17&”])”),””)

That is about it. The only thing to be aware of is that using .Children could result in multiple tuples being returned. In our example, it was safe to assume that we would only get one result as we were using the key to retrieve from the customer table,  but if the function could return multiple tuples, we have to use CUBESET and CUBERANKEDMEMBER to pick a value out of the set.

Posted in CUBE Functions | Tagged , , , , , , | 9 Comments

MDX CUBEd

I have posted a few times about Excel’s CUBE functions which are very useful for creating reports in a format and layout that pivot tables often just cannot achieve. They were useful when SSAS were the only cubes to work with, but with the introduction of PowerPivot and in-memory cubes, their use has grown for me.

As well as basic CUBE function capability, they are extensible by customising the MDX expressions. In a recent post I showed how to use the MDX EXISTS function to get each product sub-category within each product category in the AdventureWorks database. This worked, and whilst it was primarily intended to show that a bit of MDX can go a long way, this solution was a bit klunky.

Last week I was giving a PowerPivot course with Chris Webb, and in a session on CUBE functions I gave this MDX example, using separate product, product sub-category, and product category tables. In this session we discussed the possibility of adding a product hierarchy and using that hierarchy to get the breakdown of categories. Trying it there and then, we were getting close (we could see all of the product sub-categories within a given category in intellisense), but we couldn’t get it quite right. I put this down to my lack ability in MDX, so I asked Chris to help me; Chris is a top MDX guy.

With Chris’s help and guidance, good progress has been made. I think this is a useful way to get descendants/children of an item at a certain level, so I will share it here. To begin with, a few details of my model:

  • I am using Excel 2013 here
  • my product table was renamed from DimProduct to Product within the model
  • the product hierarchy, named ProductHierarchy, consisted of a product category name, ProductCategory, product sub-category name, ProductSubcategory, and product, Product

To get the set of category names from this hierarchy, I used
[E5]:=CUBESET(“ThisWorkbookDataModel”,”[Product].[ProductHierarchy].[ProductCategory]”,”Category set”)

To get the set of sub-categories within the Accessories category, I used
[F5]:=CUBESET(“ThisWorkbookDataModel”,”[Product].[ProductHierarchy].[All].[Accessories].Children”,”‘Accessory’ sub-category set”)
Assuming that the category set is listed in cells E7 down, we could use a cell reference to E8 (which returns the Accessories member), it reduces tp
=CUBESET(“ThisWorkbookDataModel”,”[Product].[ProductHierarchy].[All].[“&E8″].Children”,”‘Accessory’ sub-category set”)

And finally, to get the set of products within the Bike Racks sub-category, I used
=CUBESET(“ThisWorkbookDataModel”,”[Product].[ProductHierarchy].[Accessories].[Bike Racks].Children”,”‘Bike rack’ product set”)
Note that we have to include all parent levels of the hierarchy, both Accessories and Bike Racks.
And again, assuming that the category set is listed in cells E7 down, we can use a cell reference to E8 (which returns the Accessories member), and that the product set for Accesories is listed in F7 down, we can use a cell reference to F7 (which returns the Bike Racks member), it reduces to
=CUBESET(“ThisWorkbookDataModel”,”[Product].[ProductHierarchy].[“&E8&”].[“&F7&”].Children”,”‘Bike rack’ product set”)

There we have it, a relatively simple way to drill-down to any level within a hierarchy.

Please not that if you try this and copy and paste the formulas above, it seems to have turned some of the double-quotes to the wrong type, so they will need editing.

Posted in CUBE Functions | Tagged , , , , , , | Leave a comment

PASS Business Analytics Conference – an Excel view

Chris Webb recently posted his thoughts on the first PASS Business Analytics Conference. Chris is a SQL Server MVP, but one of the first that I recall who acknowledged that Excel is the business’ choice, even for BI/BA. I am coming at BI and BA more from the other direction, but as I also attended this conference, I thought it would be worthwhile presenting my views on it.

I have to say how surprised I was at the number of paying customers that the conference managed to draw in. Myself and some colleagues have held many UK Excel conferences in the past, although we have found it very difficult to provide an event that doesn’t lose money. In the UK at least, people who use Excel seem very unwilling to spend their hard-earned money to attend such an event, even one that will provide good insights that they can deploy in their daily life. Similarly, corporations seem reticent to support their people by sending them to such events. There seems to be a view that everybody knows Excel, and it is easy to use. When we held free events, they were very well attended, but we can only fund such events from our own pockets a limited number of times, and we have difficulty convincing sponsors to support us (even Microsoft no longer offer us the room facilities that we have enjoyed on two previous occasions). I can think of a few reasons that the PASS conference managed to draw in so many paying customers:-

  • it was held in the USA, and Americans seem far more willing to fund this sort of learning activity than we do in in Europe
  • it was big, there was enough content to satisfy most needs – unlike our conference which have offered few choices
  • it was organised by PASS which gave it a credibility that individuals and corporations are more prepared to buy in to.

It was good to see so many people at the event, and even more interesting that the people weren’t the usual mix of SQL Server people that you typically see at PASS meetings, there were a lot of business people that use data and wanted to get the insights that such a conference can bring.

The Microsoft keynote was fun if you like comedy double acts, but as it largely centred around PowerView, and the public preview of project codename “GeoFlow” for Excel 2013 (sic!), two products that have been out in one form or another for some time, it was hardly ground breaking. I thought they could have given more (some) exposure to Data Explorer, but I guess Data Explorer is too dull for Microsoft marketing, it just works well, it doesn’t have bling and visual distraction to obscure data. There was no mention that I caught of any Microsoft developments in mobile BI, but I will leave others to rail on that topic.

I was largely disappointed by the content. I acknowledge that I wasn’t there primarily to learn, more to be part of the event and to try and gain some insights that I can take forward to our conferences, but it would have been good to have got more out of it. Too much of it was far too basic in my opinion, but I also felt that the organisers did not give enough emphasis to Excel and some of the Excel experts out there.

I have been saying for many years that I believe that Excel is the single most important tool in the BI/BA world, and as such I was surprised and somewhat disappointed to see so few of the big names in the Excel world. Rob Collie was there (I know he is a SQL Server MVP, but he is an Excel evangelist of the first order), but where were the Excel guys who munch data in their sleep, who have taken Excel charting and visualisations to new heights, who know how to write VBA? I would have thought that inviting some of these big names, such as Jon Peltier, Mike Alexander, Debra Dalgleish, Jorge Camoes to name but a few – these people are top-notch and would have added substantial value in my view.

I was glad that I went, it was very well organised, and I thoroughly enjoyed Professor Levitt’s keynote. I caught my first presentation by Marco Russo, someone who has similar views on data modelling in self-service BI. I did not know of Faisal Mohamood, a member of Microsoft’s Data Explorer team, before this conference, but I will be watching out for his postings in future. Whilst I did not personally get much out of his session, it was good to see the eyes lighting up with so many of the audience when they saw what Data explorer can do for them.

It was good to meet  so many people with similar interests, but it would take a far better programme for me to attend again, or maybe a UK based event. I must add that I didn’t have to pay for my conference fee, but my travel and accommodation were out of my own pocket and I would need more justification to do this a second time.

Posted in PASS | Tagged , , , , , , , , | Leave a comment