UK Bank Holidays in Power Query – Easter

In my previous post I showed how to calculate all of the UK Bank Holidays for a nominated year,all holidays except the Easter holidays. In this post we will wrap it up by showing how to calculate those holidays.

The holidays covered in the previous post were relatively straightforward, a matter of determining the day of the week for the holiday and adding or subtracting a number of days so as to avoid weekends or to force the date to a Monday.

Easter is a lot more complicated. I won’t go into the details of how to calculate it here, instead I will refer you to to this article in the Independent,  and more detail than you will ever need in this Wikipaedia article.

One of the days is easy, Easter Friday is the Friday before Easter Monday, Easter Moday is the Monday after Easter Friday, so we just need to calculate one, the other is easily derived by adding or subtracting 3 days. I decided to base my Power Query calculation on a formula that I have used in Excel for many years to calculate Easter Monday. This formula is

=ROUND(("4/"&target.year)/7+MOD(19*MOD(target.year)-7,30)*.14,0)*7-6+1

This works as Excel dates are stored as the number of days from 1st Jan 1900, today’s date (26th Feb 2016) has a day number of 42426. When you see a date, you are seeing that number formatted in your desired date format. The formula above determines a starting date of 1st April for the nominated year, and does some arithmetic on that underlying date number to get to the date number for Easter Monday. Unfortunately, as far as I am aware Power Query does not provide us with that capability, so we have to roll our own.

As an aside, I make no claims to this formula, it is something I picked up some time ago, because it works.

To facilitate this calculation, I have created three functions, one to calculate the number of days in a given year, another to calculate days to date within a given year. The final function, is a simple function that calls DaysInYear for every year required (1900 to year-1   of the target date in our case) and also calls DayNumberofYear for the year the target date.

DaysInYear Function

I have commented the function, so I hope that it is easy to follow. The approach is to generate a list of dates for the 1st day of each month in the nominated year. This list is then passed to the Power Query Date.DaysInMonth function to generate a list of the number of days in each month in that year, this list is then summed.

 (year) => 

    let

        last.day.year = Date.FromText("31/12/" & year),
        //setup a list of month numbers
        list.Months = List.Numbers(1,12),
        //create a list of first day of each month
        list.FirstOfMonths = List.Transform(  
                                 list.Months,
                                 each Text.From(year)&"-"&Text.From(_)&"-1"),
        //transform to dates
        list.FirstAsDates = List.Transform( 
                                list.FirstOfMonths,
                                each DateTime.Date(DateTime.FromText(_))),
        //create list of the days in each month 
        list.DaysInMonths = List.Transform(
                                list.FirstAsDates,
                                each Date.DaysInMonth(_))
    in     
        //sum the list of days in months                         
        List.Sum(_list.DaysInMonths)

DayNumberOfYear Function

This function is similar to the DaysInYear function in that it generates a list of days in each month for the year of the nominated date, but it only generates a list for the months before the month nominated date. These days are summed and added to the daynumber of the nominated month.

(date) =>

    let

        //transform parameter to true date
        target.date = DateTime.Date(DateTime.FromText(date)),
        //get year and month of target date
        year = Date.Year(target.date),
        month = Date.Month(target.date),
        //build a list of months prior to month of target date
        list.Months = List.Numbers(1, month-1),
        //generate a list for the first of each month
        list.FirstOfMonths = List.Transform(
                                 list.Months,
                                 each Text.From(year)&"-"&Text.From(_)&"-1"),
        list.FirstDates = List.Transform(
                              list.FirstOfMonths, 
                              each  DateTime.Date(DateTime.FromText(_))),
        list.DaysInMonths = List.Transform(
                                list.FirstDates,
                                each Date.DaysInMonth(_))
    in
        //sum the monthly day values in list and addin the day of this month
        if List.Sum(list.Days) = null then 0 else List.Sum(list.DaysInMonths)+Date.Day(target.date)

DaysToDate Function

Here we build a list of every year from 1900 until the year before the year of the date we are calculating the days to date for. Each year in this list is passed to our DaysInYear function to generate a list of days for each year. This is summed and added to the days to date for the date we are calculating.

If you notice, when  these values are summed I add a test to see whether our list of years includes 1900, adding an extra 1 if it does. This is because Excel erroneously treats 1900 as a leap year, and so will add 366 days for that year. Power Query does not treat 1900 as a leap year, so it calculates 365 days for that year. Although Power Query is handling this correctly, we are emulating that Excel formyula so we add 1 if we include 1900. I know that, as we emulate Excel as in this situation, our list of years will always include 1900, but I have built it so that it is a more generic function.

(target as date) =>

    let


        //setup the list of years to pass to DaysInYear function - 1900-target year-1
        list.Years = List.Numbers(1900, Date.Year(target)-1900),
        //genereate a list of days in each year
        list.DaysInYears = List.Transform(  
                               list.Years,
                                each DaysInYear(Text.From(_))),
        //calculate day number in year of target date
       days.YTD = Number.From(DayNumberOfYear(Date.ToText(DateTime.Date(target)))),
        //sum them both
        total.Days = List.Sum(list.DaysInYears) + days.YTD + (if List.Contains(list.Years, 1900) then 1 else 0)
    in
        total.Days</pre>

Easter Holidays

With these 3 functions, we can now emulate the Excel Easter Monday formula shown above.

Easter.Monday = Date.From(Number.Round(DaysToDate(#date(target.Year,4,1))/7+Number.Mod(19*Number.Mod(target.Year, 19)-7, 30)*.14, 0)*7-6+1),

It looks complex, but each part aligns nicely to the Excel formula. The main addition is that whereas the Excel date, “4/”&target.year, can be directly manipulated arithmetically, in Power Query we have to calculate the number of days, DaysToDate(#date(target.Year,4,1)).

The part that emulates the Excel formula, Number.Round(DaysToDate(#date(target.Year,4,1))/7+Number.Mod(19*Number.Mod(target.Year, 19)-7, 30)*.14, 0)*7-6+1, calculates the number of days from 01/01/1900, whichis passed to the Date.From function to get a date in the format we are familiar with.
As previously mentioned, if we know Easter Monday, Easter Friday is simply derived by subtracting 3 days

Easter.Friday = Date.AddDays(Easter.Monday, -3),

Final Holiday List Generator

The fulll routine that produces a list of the holidays is shown below. The target year is hard-coded here, it would be simple to parameterise it in some way.

let  
  
    target.year = 2015,

    Jan1 = #date(target.year,1,1),
    Jan1DoW = Date.DayOfWeek(Jan1, Day.Monday),
    New.Years.Day = if Jan1DoW>4 then Date.AddDays(Jan1,7-Jan1DoW) else Jan1,
    
    Easter.Monday = Date.From(Number.Round(DaysToDate(#date(target.year,4,1))/7+ Number.Mod(19*Number.Mod(target.year, 19)-7, 30)*.14, 0)*7-6+1),
    Easter.Friday = Date.AddDays(Easter.Monday, -3),
    
    May1 = #date(target.year,5,1),
    May1DoW = Date.DayOfWeek(May1,Day.Monday),
    May.Day = Date.AddDays(May1,Number.Mod(7-May1DoW,7)),

    May31 = #date(target.year,5,31),
    May31DoW = Date.DayOfWeek(May31,Day.Monday),
    Spring.Bank.Holiday = Date.AddDays(May31,-May31DoW),

    Aug31 = #date(target.year,8,31),
    Aug31DoW = Date.DayOfWeek(Aug31, Day.Monday),
    Summer.Bank.Holiday = Date.AddDays(Aug31, -Aug31DoW),

    Dec25 = #date(target.year,12,25),
    Dec25DoW = Date.DayOfWeek(Dec25, Day.Monday),
    Christmas.Day = if Dec25DoW>4 then Date.AddDays(Dec25,7-Dec25DoW) else Dec25,
    Boxing.Day = Date.AddDays(Christmas.Day, if Dec25DoW=4 then 3 else 1), 

    Holiday.List = Table.FromList(
                       {[A="New Year's Day", B=New.Years.Day],
                        [A="Easter Friday", B=Easter.Friday], 
                        [A="Easter Monday", B=Easter.Monday], 
                        [A="May Day", B=May.Day],
                        [A="Spring Bank Holiday", B=Spring.Bank.Holiday],
                        [A="Summer Bank Holiday", B=Summer.Bank.Holiday],
                        [A="Christmas Day", B=Christmas.Day],
                        [A="Boxing Day", B=Boxing.Day]},
                       Record.FieldValues, {"Holiday", Text.From(target.year)})
in
    Holiday.List

In Summary

There we have it, a list of all UK Bank Holidays for a given year. This also shows how we can leverage legacy Excel formulas within Power Query.

The Two functions DaysInYear and DayNumberOfYear are very similar in concept and in processing steps, so they could be combined into a single function, or DayNumberOfYear could call a modified version of DaysInYear, all in the cause of removing duplicate code..

Building a list of days in every year since 1900 does seem overkill, but although I scoured the Power Query specs, the Formula LanguageSpecification  and the Formula Library Specification, I cannot see any way of getting the number of days for a period within Power Query, hence my functions. In Power Query, dates have a date type, they are not just numbers with a special date display format.

One interesting thing that I found is in the Power Query Date.From function. As I show in the Easter Monday calculation, give Date.From a number and it will return a date. Passing the number 42426, the value that Excel stores for today’s date of 26th Feb 2016, it returns the date 26/02/2016. As Power Query correctly treats 1900 as not a leap year, I was expecting 25/02/2016, and if you count the days directly in Power Query, 26th Feb is indeed a day number of 42425. It seems that Power Query has picked up some of Excel’s legacy errors.

And finally, for those who cannot get enough Excel (isn’t that everyone?), Debra Dalgleish has a comprehensive summary of algorithms, Excel formulae, and UDFs that calculate Easter on her Contextures site.

 

Posted in Excel, Functions, M Code, Power BI, Power Query, UK Bank Holidays | Tagged | 4 Comments

UK Bank Holidays in Power Query

I recently built an application that needed to be able to work out the UK Bank Holidays for any given year. This is relatively easy in Excel, the only two that are a little taxing are Easter Friday and Easter Monday. As I was using Power Query to gather the data and tidy it up, I thought it would be good to see if I could emulate these formulas in Power Query.

As I said in the previous paragraph, most are straight-forward, it is just the Easter holidays that take a little more thought, so I will show how to calculate all of the others in this post, and follow-up with a post that covers the Easter holidays later this week.

New Year’s Day

We’ll start with New Years Day. This holiday falls on the 1st Jan, unless it is a Saturday or a Sunday, when it falls on the following Monday. The M code for this is


    target.year = 2016,
    Jan1 = #date(target.year,1,1),
    Jan1DoW = Date.DayOfWeek(Jan1, Day.Monday),
    New.Years.Day = if Jan1DoW>4 then 
                       Date.AddDays(Jan1,7-Jan1DoW) else Jan1,

We start by setting a variable for the year that I need to calculate the holiday dates for, in this case 2016.

This is followed by a variable for the 1st Jan date , using that year variable.

The last line simply checks if the day of the week for Jan1 is a Saturday or Sunday, and adds  2 or 1 to the date accordingly. From this, you can see that we get the day of the week for Jan1, using Monday as the first day of the week. We calculate the numbers of days to add on by subtracting that day of the week from 7 (7 -5 for Saturday gives 2, 7 – 6 for Sunday gives 1). If that day of the week is greater than 4, Power Query works in base 0, we add on those days, otherwise we just use the Jan1 date.

Easter

To be covered next time.

May Day

May Day falls on the first Monday of the month, so it always falls on a Monday regardless.

The code to calculate this is

 
    May1 = #date(target.year,5,1), 
    May1DoW = Date.DayOfWeek(May1,Day.Monday) 
    May.Day = Date.AddDays(May1,Number.Mod(7-May1DoW,7)),

As before, we create and load variables for the date and the day of the week. Instead of adding 1 or 2 days if the date falls on a Saturday or a Sunday, we take the Mod 7 of (7 minus the day of the week), and add that number of days. If May 1st is a Monday, we get Mod(0, 7) which is 0, Tuesday gives Mod(1, 7) which is 1, and so on. Adding this value to the May1 date gives us the first Monday of the month.

If you have other uses for the 1st Monday (or Tuesday, etc.) of the month, this would make a useful function in Power Query.

Spring & Summer Bank Holiday

Both the Spring and Summer Bank Holidays are the last Monday in their respective months, May and August.

For this, we simply need to take the day of the week of the 31st day from the 31st date.

    May31 = #date(target.year,5,31),
    May31DoW = Date.DayOfWeek(May31,Day.Monday),
    Spring.Bank.Holiday = Date.AddDays(May31,-May31DoW),

    Aug31 = #date(target.year,8,31),
    Aug31DoW = Date.DayOfWeek(Aug31,Day.Monday),
    Summer.Bank.Holiday = Date.AddDays(Aug31, -Aug31DoW),

Christmas & Boxing Day

Christmas Day is calculated in the same way as New Year’s day, if it’s day falls on a Saturday or a Sunday then it shifts to the following Monday. We could calculate Boxng Day in the same manner, but as it is always the working day after Christmas Day, we can simply test Christmas Day for a Friday and add 3 days to get the following Monday as the Boxing Day date, otherwise add 1 day to get the next day.

    Dec25 = #date(target.Year,12,25),
    Dec25DoW = Date.DayOfWeek(Dec25,Day.Monday),
    Christmas.Day = if Dec25DoW>4 then Date.AddDays(Dec25,7-Dec25DoW) else Dec25,
    Boxing.Day = Date.AddDays(Christmas.Day, if Dec25DoW=4 then 3 else 1),

In Summary

The full code, minus the Easter holidays, now looks like this


let 
 
 target.year = 2016,

 Jan1 = #date(target.year,1,1),
 Jan1DoW = Date.DayOfWeek(Jan1, Day.Monday),
 New.Years.Day = if Jan1DoW>4 then Date.AddDays(Jan1,7-Jan1DoW) else Jan1,
 
 //Easter.Monday = to be added,
 //Easter.Friday = to be added,
 
 May1 = #date(target.year,5,1),
 May1DoW = Date.DayOfWeek(May1,Day.Monday),
 May.Day = Date.AddDays(May1,Number.Mod(7-May1DoW,7)),

 May31 = #date(target.year,5,31),
 May31DoW = Date.DayOfWeek(May31,Day.Monday),
 Spring.Bank.Holiday = Date.AddDays(May31,-May31DoW),

 Aug31 = #date(target.year,8,31),
 Aug31DoW = Date.DayOfWeek(Aug31, Day.Monday),
 Summer.Bank.Holiday = Date.AddDays(Aug31, -Aug31DoW),

 Dec25 = #date(target.year,12,25),
 Dec25DoW = Date.DayOfWeek(Dec25, Day.Monday),
 Christmas.Day = if Dec25DoW>4 then Date.AddDays(Dec25,7-Dec25DoW) else Dec25,
 Boxing.Day = Date.AddDays(Christmas.Day, if Dec25DoW=4 then 3 else 1), 

 Holiday.List = Table.FromList(
                    {[A="New Year's Day", B=New.Years.Day],
                     [A="Easter Friday", B="TBA"], 
                     [A="Easter Monday", B="TBA"], 
                     [A="May Day", B=May.Day],
                     [A="Spring Bank Holiday", B=Spring.Bank.Holiday],
                     [A="Summer Bank Holiday", B=Summer.Bank.Holiday],
                     [A="Christmas Day", B=Christmas.Day],
                     [A="Boxing Day", B=Boxing.Day]},
                    Record.FieldValues, {"Holiday", Text.From(target.year)})
in
 Holiday.List

The list at the end is simply creating a table to show the calculated bank holidays in a simple form.

All that is left now is to add the Easter holidays, which I will cover in my next post.

Posted in Excel, M Code, Power Query, UK Bank Holidays | 1 Comment

London Data Culture Day 2015

At the recent London Data Culture Day, organised by Jen Stirrup, I gave a presentation on Excel Cube Functions, querying a SQL Server database uploaded into Power Pivot to provide the cube, focusing on using MDX to enhance the capability of the function.

Jen has already posted the slides on SlideShare,  but for anyone that is interested, you can grab my demo files from my OneDrive.

Posted in CUBE Functions, Excel, Pivots, Power BI, Power Pivot, Uncategorized | Tagged | Leave a comment

One Slicer To Control Them All

In Excel 2007, Microsoft added structured tables. Although far from being my favourite Excel feature, they are certainly a step in the right direction, and Microsoft are using them more and more in other functions.

Excel 2010 saw the addition of slicers with pivot tables, a feature that provides a much more intuitive and visual filtering of pivot tables.

Excel 2013 gave us the ability to add a slicer to a structured table (see the step-wise progress here :D?), definitely a welcome improvement. One of the great features of slicers with pivot tables is that you can link the slicer to multiple pivots, synchronised filtering. Unfortunately, in their wisdom Microsoft did not see fit to provide similar functionality to slicers attached to structured tables.

The question as to whether you can control multiple tables from one slicer, and the answer is invariably no. Recently this question was asked on LinkedIn, so I built such a facility, One Slicer To Control Them All.

Slicers cannot control multiple tables, so my approach is to create a pivot table from one of the tables, designated as the master table in this context, and add slicer(s) for this pivot. Selecting from the slicer will ripple down to the pivot table, and changes to this table can be trapped using the PivotTableUpdate event. By trapping the changes in this event, we can then filter all tables for this criteria.

That is basically all that is required, but to cater for any changes in the master table, changes that wouldn’t automatically reflect any slicer selections, I trap those changes using the SelectionChange event and refresh the pivot (which will then trigger the PivotTableUpdate event, thereby filtering the new data accordingly). I am only trapping the master table, it would not be difficult to trap a change to any of the tables.

One other thing it caters for is the situation where one or more of the tables has headers suppressed (why you would do this is beyond me, but there is the option in the Table Style Options group of the Table context ribbon), in which case when testing the HeaderRowRange, which is done to find which column to filter, nothing is returned, so nothing gets filtered. This was pointed out to me by Zack  Barresse, an Excel MVP who wrote the book on structured tables, and he also provided me with a function to resolve this. This function, HeaderRowlabels in the code below, returns an array of column headings regardless of this header setting. By checking the slicer caption against this array, I can get the column regardless. I also added a check in my code to test whether headers are shown, as the filter restores the headings when suppressed, so I need to reset to the user’s choice.

Outlining the steps in a little more detail:

Setup

First, I create a pivot table from the table that is designated as the master table, and I display all of the items to that pivot (you can hide it far off to the right, or on a hidden sheet, if you don’t want it to be shown).

I then add slicer(s) to the pivot table for my filter fields. The VBA traps the Worksheet_PivotTableUpdate event, and iterates all tables to filter by the slicer criteria.

The names of the master table and the controlling pivot table are setup as module constants to facilitate maintenance.

It is possible that the user has table headers turned off, in which case .HeaderRowRange returns nothing, the function HeaderRowLabels gets an array of header labels regardless of this setting.

This code assumes that all of the tables are on the same sheet, if they are spread across multiple sheets the code will need to be amended to cater for this.

Managing Table Additions/Changes

As users will be adding to the tables, and as the pivot does not automatically update, it needs to be refreshed which is managed from code.

The Worksheet_SelectionChange code checks if any cell that is changed is within the master table, the table that the pivot is generated from, it refreshes the pivot table, a refresh that does ripple through to the slicers.

This could easily be changed to check all tables.

Limitations

This technique has clear limitations, it is almost entirely predicated upon the assumption that the tables are of the same structure. The slicers do not get updated if any of the tables other than the master table is updated.

Within those constraints it seems to work well, and some of the constraints can be mitigated with code changes.

Code

Option Explicit

Const TABLE_MASTER As String = "tblMaster"
Const PIVOT_NAME As String = "pvtControl"

Private Sub Worksheet_PivotTableUpdate(ByVal Target As PivotTable)
Dim Slicer As SlicerItem
Dim Table As ListObject
Dim Headers As Boolean
Dim values As Variant
Dim nextrow As Long
Dim idx As Long
Dim i As Long
    For i = 1 To Target.Slicers.Count
    
        nextrow = 1
        ReDim values(1 To nextrow)
        
        For Each Slicer In Target.Slicers(i).SlicerCache.VisibleSlicerItems
    
            ReDim Preserve values(1 To nextrow)
            values(nextrow) = Slicer.Name
            nextrow = nextrow + 1
        Next Slicer
        
        For Each Table In Me.ListObjects
        
            Headers = Table.ShowHeaders
            idx = Application.Match(Target.Slicers(i).Caption, HeaderRowLabels(Table), 0)
            If nextrow > Target.Slicers(i).SlicerCache.SlicerItems.Count Then
            
                Table.Range.AutoFilter Field:=idx
            Else
            
                Table.Range.AutoFilter Field:=idx, Criteria1:=values, Operator:=xlFilterValues
            End If

            Table.ShowHeaders = Headers
        Next Table
    Next i
End Sub

Private Sub Worksheet_SelectionChange(ByVal Target As Range)
Dim tableCell As Boolean

    On Error Resume Next
    tableCell = Target.ListObject.Name = TABLE_MASTER
    On Error GoTo 0
    If tableCell Then Me.PivotTables(PIVOT_NAME).PivotCache.Refresh
End Sub


Function HeaderRowLabels(ByVal Table As ListObject) As Variant
Dim TableColumn As ListColumn
Dim TableHeaders() As Variant
    
    If Table.ShowHeaders Then
    
        ReDim TableHeaders(1 To Table.ListColumns.Count)
        If UBound(TableHeaders) = 1 Then
        
            TableHeaders(1) = Table.HeaderRowRange.Value
        Else
        
            TableHeaders = Table.HeaderRowRange.Value
        End If
        
        HeaderRowLabels = TableHeaders
    Else
    
        ReDim TableHeaders(1 To Table.ListColumns.Count)
        For Each TableColumn In Table.ListColumns
        
            TableHeaders(TableColumn.Index) = TableColumn.Name
        Next TableColumn
        
        HeaderRowLabels = TableHeaders()
    End If
End Function
Posted in Uncategorized | Tagged , , , , , | Leave a comment

PASSBAC 2015 Revisited

As promised at my session on Excel’s CUBE functions at PASSBAC 2015 in Santa Clara on 22nd April, here are the files that I used at that session.

It was my intention to have just two Excel workbooks, a starting book with the data which I would then build the reports iteratively in the session, and a final workbook showing a fully completed report, on the assumption that we would run out of time to show everything.

Due to my experience with the presentation gremlins, Excel continually crashed on me at an earlier dry-run, I built other workbooks showing the various stages of the presentation. Should Excel continued to misbehave I could then pick up close to that point. I only had one incident at my session, but I think the workbooks are useful to show the different steps demonstrated. All of them are contained within the attached zip file, a small summary of each will align them to my presentation.

  • Reporting with Excel CUBE Functions – 0 Start.xlsx
    This file contains a flat table of all of the data as it would typically be found in an Excel workbook.

    • the flat table was split into a partially normalised form of tables, such as States, Cities, Stores, etc.
      The purpose of this was to load those tables into Power Pivot to create the cube worked with in the session (I used Excel tables rather than a database as a precaution against those presentation gremlins)
  • Reporting with Excel CUBE Functions – 1 Datamodel.xlsx
    The previous start file tables loaded into Power Pivot, with table relationships established
  • Reporting with Excel CUBE Functions – 2 Pivot.xlsx
    A simple pivot table of years against values types (Target or Actual) against states

    • used to check our results back to, and
    • to show how the CUBE function report differed from a standard pivot
  • Reporting with Excel CUBE Functions – 3 Static Members.xlsx
    The  beginnings of the report  emerge, but with hardcoded values

    •  CUBEMEMBER formulas for a few named states in the rows, the datamodel years in the columns,
    • CUBEVALUE formulas to show the sales value for the corresponding state and year
    • remember that the sales value at this point is showing an aggregate of the Target and Actual sales values for each state and year as the value type is not yet included
  • Reporting with Excel CUBE Functions – 4 Dynamic Erros.xlsx
    In this workbook, the report starts to become dynamic

    • it uses CUBESET and CUBERANKEDMEMBER, together with the ROW()/COLUMN() functions as indexes, to build the row and column headers dynamically
    • It also sets a measure cell with the a parameterised measure type (cells A4 and A3)
    • Where the report rows/columns are greater than the data requires, it shows cell errors
  • Reporting with Excel CUBE Functions – 5 Dynamic No Errors.xlsx
    As for the previous workbook, with the cell errors handled
  • Reporting with Excel CUBE Functions – 6 Sales Actual v Target.xlsx
    This workbook differentiates between actual and target values, identifying which year should show which

    • Two extra measures are added, Sum of Actual Sales and Sum of Target Sales, both created in Power Pivot, one using an autosummed calculated column (actual), the other a straight DAX measure to show both approaches (although both could and would normally be created as straight DAX measures)
  • Reporting with Excel CUBE Functions – 7 Region Slicer.xlsx
    A region slicer is added for this workbook:

    • the CUBEVALUE formulas are amended to include the slicer as a member expression.
    • At this point, the states list does not respond to the slicer
  • Reporting with Excel CUBE Functions – 9 Final.xlsx
    The final workbook with the fully completed report. This workbook has the following finishes:

    • all of the intermediate calculations are moved to a separate worksheet named scratch
    • a regional hierarchy created in the Power Pivot model on the States table
    • the index in the CUBERANKEDMEMBER formulas are also adapted to manage any row deletions or insertions (for example, instead of ROW(A1) for the state headings, it uses ROW()-ROW($A$3)) and similar for the year headings)
    • sparklines are included in the table, something you cannot do in a pivot
    • a simple chart of the table is added
    • a value dropdown, sales, cost, or profit, as setup in 4 above,
    • utilises  some extra MDX to show to only the states that are covered by the region(s) selected in the slicer (this is all managed on the scratch worksheet)

The zip file contains a PDF file of the presentation slides. The final file is something I briefly mentioned in the session, that is a workbook which uses Power Query to split the initial flat table into all of the dimension tables used to load the data model. This is not really relevant to the session, but it is included for anyone who thinks that they might find it of interest.

That sums up the files, all of which can be found in the zip file here download.

Finally, I showed a slide with a few links for further reading. This is that list, with a few extra details.

  • A CUBEMEMBERPROPERTY Equivalent With Power Pivot
    A blog demonstrating a CUBEMEMBERPROPERTY equivalent With PowerPivot
  • MDX CUBEd
    A blog post on using MDX in CUBE functions
  • Introduction to MDX for PowerPivot Users
    A 5 part series by Chris Webb on using MDX in Power Pivot, useful for understanding how to use MDX within CUBE functions

    1. Part 1
      The first part lays down the foundations
    2. Part 2 – Basic Sets
      The second part covers Excel’s named sets
    3. Part 3 – The Members() and Crossjoin() functions
      The third part covers cube members and the Crossjoin function
    4. Part 4 – Filters
      The fourth part, showing how to restrict the data returned using the various filtering functions
    5. Part 5 – MDX Queries
      The lastin the series, running MDX queries against the Powe rPivot model
  • Jan Karel Pieterse’s NameManager
    A free Excel addin that is far superior to Excel’s built-in NameManager
  • Excel Formula Parser
    Another free addin that takes a formula in the current cell and displays it in a nice formatted but read-only structure
  • FastExcel
    The ultimate Excel performance tool. It is not free, but as well as many facilities for measuring the performance of your spreadsheet, it has an enhanced version of the NameManager tool mentioned above, and now has a formula parser that will write-back to the Excel cell.

We ran out of time to go in detail to adding MDX, but I was asked a question afterwards about creating a list of the top 10 items by some criteria (usually value) using the CUBE functions, which is resolvable with MDX. I will blog this topic at the end of the week.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | Leave a comment

Patterns in the Sand

I have just received my copy of Marco Russo and Alberto Ferrari’s book, DAX Patterns (paid for copy!).

I am not sure how much value I will get out of the book, I think Marco and Alberto work in a domain that is light-years more advanced than mine, but I do know I will get a ton of enjoyment from seeing their analysis and dissection of these problems.

If I get a chance, I will try to post my thoughts on the usefulness of the book to mere mortals like us.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Bob The (Proto) Builder

I have been building a prototype application over the past week, using Power Query to take the principal data management sheet, and create a form of normalised data that I could load into a SQL Server database which I would then build my application code on top of.

I have done a similar thing a few times now, and it occurred to me that I was repeating what I have done each time before, identify the repeated lookup (dimensional) data, put that into separate tables with their own index, modify the original source to hold the indexes of these lookup tables, and finally change the headings to conform with the database column names. There are other steps that are particular to each prototype I build based upon the actual data encountered, and how much cleansing it might need, but all of them start as described above.

As a developer, I look for every opportunity to automate my work, and this seemed a perfect candidate, so I set about building a tool that could manage this task. This post describes that tool, a Power Query Prototype Data Builder. The Power Query that I deploy in this tool is not advanced or clever in any way, it is more the overall technique and package that I thought would be worth sharing.

In essence, the tool consists of a number of steps to get to the end goal, namely:

  • create a clean dataset from the source work
  • create a clean dataset of just the data columns to be loaded into dimension tables
  • create all of the dimension tables
  • create the output values table from the cleansed data, change all lookup values to their equivalent indexes, rename all headings.

This is the sort of thing I used to do in Excel, with some complex(ish) list formulas, and a lot of VLOOKUPs. All of these steps can easily be achieved with Power Query, and a lot more elegantly.

As well as building very similar queries in each prototype, in all of my previous efforts many of the queries were repetitive. This is something to be avoided if we are building a generic tool. Another goal is to avoid hard-coding wherever possible. To address both of these issues, I built three Power Query functions:-

  • fnGetParameters – this looks up variable data in a table on an Excel worksheet, and passes the result to our query.
  • fnBuildDimension – this function strips the targeted column name from the dimension dataset described above, removes duplicate values, and adds an index column. The input is the name of column to be dimensioned, the output is a dimension table ready to load into our datamodel.
  • fnDimensionID – this function adds a column with the index number for a targetted column in the cleansed source data, and then removes the original column. This would be called several times, once for each of the lookup values in the cleansed source data.

Let’s look at the code. I will take as my example a simple sales table (I think I got this data from Bill Jelen originally, I have just extended it over time). I use this data in a few of my talks, it is a flat Excel table that I want to break down. The table looks like this

Original Table

The first query we will look at is SetupMasterData. It is quite simple, it opens the source workbook and prepares the data, cleansing as needed. Any application specific cleansing that needed I would do in a separate query or in the final query, CreateMasterTable.

let
   Source = Excel.Workbook(File.Contents(fnGetParameter("Filepath")&fnGetParameter("Filename"))),
   Source.Sheet= Source{[Item=fnGetParameter("Sheet"), Kind="Sheet"]}[Data],
   Promote.ToHeader = Table.PromoteHeaders(Source.Sheet),
   List.Headings = Table.ToRows(Excel.CurrentWorkbook(){[Name="Headings"]}[Content]),
   Renamed.Headings = Table.RenameColumns(Promote.ToHeader,List.Headings)
in
   Renamed.Headings

As you can see, this query references the fnGetParameter function, 3 times, to get the file path, file name, and the name of the worksheet to be sourced and cleansed. The name of the parameter is passed as an input and it returns its associated value. This function is based upon a previous post of mine, Excel is for Parameters, but extended to a fully parameterised table as described by Ken Puls in his post Building a Parameter Table for Power Query.

The code for fnGetParameter is

(ParameterName as text) =>

let
   Parameter.Source = Excel.CurrentWorkbook(){[Name="Parameters"]}[Content],
   Parameter.Data = Table.SelectRows(Parameter.Source, each ([Parameter] = ParameterName)),
   Parameter.Value = 
   if Table.IsEmpty(Parameter.Data) = true then
      null
   else 
      Record.Field(Parameter.Data{0},"Value")
in
   Parameter.Value

In our example, the parameter table would like like this

Parameter Table

I won’t detail the function as it is all covered in Ken’s Post.

There are two other step in the SetupMasterData that needs explanation, the steps List.Headings and  Renamed.Headings. The purpose of these is to change all of the headings in the source file to more meaningful values, values that align to the database columns. Here again I am using an Excel table to hold the original and replacement values. This table is pulled into the query and converted to a list of lists using Table.ToRows. This list is then used in Table.RenameColumns to change the headings.

In my example, this table Headings looks like this

Prototype Headings

The next step I will cover is SourceDimensionData query which takes the cleansed source data and removes all but the dimension columns. It’s another very simple query that creates a source for the building of each dimension.

let
   Source = SourceMasterData,
   Remove.NonTargetColumns = Table.SelectColumns(Source,{"Region", "Division", "State", "City", "Store", "Type"})
in
   Remove.NonTargetColumns

The function that builds a dimension is fnBuildDimension,

(Dimension) =>

let
   Source = SourceDimensionData,
   Remove.NonTargetColumns = Table.SelectColumns(Source,{Dimension}),
   Remove.Duplicates = Table.Distinct(Remove.NonTargetColumns),
   Sort.Rows = Table.Sort(Remove.Duplicates,{{Dimension, Order.Ascending}}),
   Add.IndexColumn = Table.AddIndexColumn(Sort.Rows, Dimension&"ID", 1, 1),
   Move.IndexColumn = Table.ReorderColumns(Add.IndexColumn,{Dimension&"ID", Dimension}) 
in
   Move.IndexColumn

No hard-coding here, it will work for any nominated column. It is used in the a number of separate queries, that I call Dimxxxx, to build the dimension tables. The code to build one of the dimensions would look like

let
    Source = fnBuildDimension("Region")
in
    Source

It would be just as simple and as relevant to add that call to the final query that outputs the master table, I just prefer to abstract it. If the dimension has any particular attributes to be built, it can be done within its own query, it is simple to extend – another reason to put it in its own query.

The final function is fnDimensionID, which lookups a row value in the nominated source query, inserts a column of index values, then deletes the original lookup column.

(DataSource, Dimension, JoinColumn) =>

let
   Source = DataSource,
   Get.Dimension= Table.NestedJoin(Source,{JoinColumn},Dimension,{JoinColumn},JoinColumn&"ID"),
   Expand.JoinColumn = Table.ExpandTableColumn(Get.Dimension, JoinColumn&"ID", {JoinColumn&"ID"}, {JoinColumn&"ID"}),
   Delete.JoinColumn = Table.RemoveColumns(Expand.JoinColumn,{JoinColumn})
in
   Delete.JoinColumn

This function is used in the final step, CreateMasterTable, for each lookup value, add all of the indexe values, and move those columns to the front.

let
   Source = SourceMasterData,
// replace all lookup values with indexes here
   Index.Region = fnDimensionID(Source, DimRegion, "Region"),
   Index.Division = fnDimensionID(Index.Region, DimDivision, "Division"),
   Index.State = fnDimensionID(Index.Division, DimState, "State"),
   Index.City = fnDimensionID(Index.State, DimCity, "City"),
   Index.Store = fnDimensionID(Index.City, DimStore, "Store"),
   Index.Type = fnDimensionID(Index.Store, DimType, "Type"),
// add a master index if required
   Index.Master = Table.AddIndexColumn(Index.Type, "SalesID", 1, 1),
// reorder columns to place indexes at front
   Move.IndexColumns = Table.ReorderColumns(Index.Master,{"SalesID", "RegionID", "DivisionID", "StateID", "CityID", "StoreID", "FiscalYear", "TypeID", "Costs", "Sales", "Profit"})
in
   Move.IndexColumns

The functions should be usable without any modification.

The two Source queries might need to be extended to add any of your application specific processing, as might the CreateMasterTable query. The queries that create the dimension tables could include further processing for that dimension.

That is it. The code is finished. All that is needed now is to load into our datamodel and use it.

The only thing I can think of to automate it further would e to somehow parameterise the dimension attributes, table name and source column, and build them all in one query. Another day maybe.

The data workbook can be downloaded from here, and the builder workbook here.

Posted in Power Query | Tagged , , , , | Leave a comment

Just One Slice Please

We all use pivot tables, don’t we? And with the introduction of slicers, we have a much more friendly and nicer way of filtering the results than the old pivot filters.

One drawback of slicers is that it is not possible to restrict slicer selection to a single item. This can be useful in a number of situations, but is not a facility provided with facilities.

Unfortunately, Slicers suffer from a Microsoft practice that is getting more prevalent as Excel’s BI features are extended, there is little or no exposure of slicers to the VBA object model, which would allow us to control them.

In this post, I will present one way of providing this control, using some simple techniques and a bit of VBA. I will demonstrate this with a simple table of sales by sales-person over year. I added a slicer for the year, as shown in the following image.

Basic Analysis

As you are no doubt aware, we can select 1, 2 or 3 years in the slicer, and the connected pivot table will reflect that selection. In order to restrict the slicer to a single selection, I added another pivot, it only needs the Year field, and some event code that traps any change to the second pivot and restrict it to the first of any selected pivot items.

The detailed steps are as follows.

First I added another pivot table based upon the sales table, which I called ptYearControl. In this pivot, I just added the Year field to the rows area.

Control Pivot

This pivot needs to be connected to the slicer that controls the first pivot. This is done by selecting the slicer, and then in Slicer Tools>Options>Slicer>Report Connections check the box by the second pivot name, ptYearControl in this case.

Set Slicer Control

We now have both pivots connected to our slicer, and any selection in the slicer will be reflected back in both pivot tables. But we can still select multiple items from the slicer, so how do we restrict this?

Here we need some good old VBA (the language that won’t die, no matter how hard Microsoft tries). Unlike slicers, pivot tables are exposed to the object model, so we have a PivotTableUpdate event, and we will use this. The code for this event, which will reside in the code module for the worksheet that the ptYearControl pivot is within. The code is

Private Sub Worksheet_PivotTableUpdate(ByVal Target As PivotTable)
Const CONTROL_PIVOT As String = "ptYearControl"
Const CONTROL_FIELD As String = "Year"
Dim pi As PivotItem
Dim itemFound As Boolean

    On Error GoTo wp_exit

    Application.EnableEvents = False

    If Target.Name = CONTROL_PIVOT Then

        With Target.PivotFields(CONTROL_FIELD)

            For Each pi In .PivotItems

                If pi.Visible Then

                    If itemFound Then

                        pi.Visible = False
                    Else

                        itemFound = True
                    End If
                End If
            Next pi
        End With
    End If

wp_exit:
    Application.EnableEvents = True
End Sub

How does this work? It simply loops through all pivot items in the targeted pivot table, and hides each visible item, except the first. I use a couple of constants for the pivot table name and name of the pivot field, to make it simple to amend. Events are disabled so that the changes that we make to the pivot table in the code do not cause a cascade of events (changes triggering the event, which makes changes that trigger the event again, and so on).

That is all there is to it. The second pivot table is only there to help us control the slicer.

Single Slice

It would be nice if we could make it not visible, but I guess that the original designers never envisaged a pivot table that was not to be seen, but it can be moved off-stage as far as necessary to get it out of the way. Another option is to put the control pivot table, and the event code, on a separate sheet, which is probably the proper way to do it (and this worksheet can be hidden).

I thought this would work just as well with a PowerPivot pivot table, but when I tried it first did not recognise my field name. No great surprise there, the PowerPivot datamodel is a cube, and cube fields require a qualified reference; in this case I had to use a value of [tblSales].[Year].[Year]. After that, I got the error ‘unable to set the visible property of the pivotitem class’ when trying to make subsequent pivot items not visible. This is not hard to correct, and is a more elegant solution; as this is a cube we can directly set the visibility of all of the pivot items using the VisibleItemsList property, setting this to a single item array, the first visible item found. The code in this case is as follows:

Private Sub Worksheet_PivotTableUpdate(ByVal Target As PivotTable)
Const CONTROL_PIVOT As String = "ptYearControl"
Const CONTROL_FIELD As String = "[tblSales].[Year].[Year]"
Dim pi As PivotItem
Dim itemFound As Boolean

    On Error GoTo wp_exit

    Application.EnableEvents = False

    If Target.Name = CONTROL_PIVOT Then

        With Target.PivotFields(CONTROL_FIELD)

            For Each pi In .PivotItems

                If pi.Visible Then

                    .VisibleItemsList = Array(pi.Name)
                    Exit For
                End If
            Next pi
        End With
    End If

wp_exit:
    Application.EnableEvents = True
End Sub

I have tried this quite a lot with Excel pivot tables, not so much with PowerPivot pivot tables, but it seems to work fine, and gives us that ability to restrict a slicer to a single selection. If you use this method, do report back on how it works for you, whether you come across any issues, problems, or oddities. And if you have any other way of achieving this level of control, do share it here.

Posted in Pivots, Power Pivot | Tagged , , , , , | 12 Comments

Excel is for Parameters

In my last two posts, I have shown how to build a datamodel in Excel from an Excel workbook that had two levels of headings, and then I showed how to improve the code  using functions.

In the second I received one excellent comment on making the age bands more dynamic, and driven from another Excel table. This allows easy changing of the bands, and also makes it simpler to add further bands, it is usually easier to do anything in Excel than in code.

This comment also prompted me to address a further small issue I had with the code, namely that the workbook path was hardcoded into the M code. We all know that hardcoding is anathema to ‘real’ developers, and it was a bit of a nuisance my colleagues and I had to keep changing the M code as the path was different on different machines (time for a rant, it is a real pain with Power Query when you have multiple queries in a workbook that you have to exit back to Excel from the current query to then open another one, Microsoft should really make this a more efficient process).

To parameterise this path is simple. I added a two row table called FilePath with the following details:

FilePath

I created a simple function to build the path and name

let
    Source = Excel.CurrentWorkbook(){[Name="FilePath"]}[Content],
    InputPath = Source{0}[FilePath],
    InputFile = Source{1}[FilePath],
    Result=InputPath&InputFile
in
    Result
in
    FileName

Another simple change to my original main procedure invokes this function. I changed

 
    Source = Excel.Workbook(File.Contents("C:\Users\Bob\Documents\My Blogs\Unpivot Nested Headings\NZ Birth Rates.xlsx")),

to this 

     Source = Excel.Workbook(File.Contents(fnFileName())),

I think this is a much more acceptable approach, anyone using the workbook doesn’t have to go into Power Query to use their file, they just change an Excel table value.

A further enhancement we can introduce is if the source data workbook is in the same folder as the query workbook, using an old Excel trick to get the filename, the CELL function. The function
=CELL(“filename”,A1)
returns the full file path name of the workbook. In my case it would be
C:\Users\Bob\Documents\My Projects\_5 Development\ComCamp\[BirthsByRegionWithMap v2.xlsx]Source
We only want the path, which is the part to the left of the [ character. This is easily achieved using LEFT and FIND, namely
=LEFT(CELL(“filename”,A1),FIND(“[“,CELL(“filename”,A1))-1). 

Filepath formula

Posted in Power Query | Tagged , , , , | 6 Comments

A Case For Functions

One of the things I don’t like about Power Query is that the list of instructions gets quite long, when a transformation is more than just a simple replace or add columns, and becomes less easily maintained. I am a firm believer in the Smurf principle that a procedure should not take more than one screen. Longer procedures get broken down into sub-procedures, called from the main procedure.

Chris Webb posted another excellent Power Query blog this morning, Conditional Logic in Power Query, where Chris shows how to implement a pseudo-Case statement in Power Query. 

My procedure with nested IFs was getting far too long, and unfortunately, although Chris’ pseudo-Case statement  was more elegant, it did nothing to alleviate this. So I took this one step further, to get the best of both worlds. I built this case logic into a function that could be called from the main procedure, reducing the complexity of that procedure and increase the maintainability.

In this example, the data I will be using is a simple list of names and ages, and I will create an added column of age band. The data is shown below

Source

The idea is to create bands of under 20, 20-39, 40-59, and 60 or over.

First, I loaded the table into Power Query, using the Excel Data>From Table button on the Power Query ribbon tab.

My original code would have looked like this, using nested IF statements

let
   Source = Excel.CurrentWorkbook(){[Name="tblAgeBand"]}[Content],
   InsertedAgeBand = Table.AddColumn(Source, "Age Band", each
     if [Age] < 20
         then "under 20"
     else if [Age] < 40
         then "20-39"
     else if [Age] < 60
         then "40-59"
     else "60 or over")
in
   InsertedAgeBand

Although this works, the results would be as shown in the following image. I think you can agree that the code is not very elegant, even with this very simple set of statements. It can , and does, get far worse with a more complex example.

CfF - If results

To improve this, I created a function to calculate and return the age band, and then invoked that from the main procedure.

The function, which I call fnAgeBand would look as follows

// fnAgeBand
let AgeBand = (age) =>

let
    AgeBands = 
        {
          {(x)=>x<20, "under 20"},
	  {(x)=>x<40, "20-39"},
	  {(x)=>x<60, "40-59"},
	  {(x)=>x<99, "60 or over"}
        },

    Result = List.First(List.Select(AgeBands, each _{0}(age))){1}
in
    Result
in
    AgeBand

Don’t forget to uncheck the ‘Load to worksheet‘ in the LoadSettings.

All that is the needed is to invoke the function from the main procedure. The code is shown below

let
    Source = Excel.CurrentWorkbook(){[Name="tblAgeBand"]}[Content],
    InsertedAgeBand = Table.AddColumn(Source, "Age Band", each fnAgeBand([Age]))
in
    InsertedAgeBand

You can download the workbook here.

Before anyone adds this, the band could of course be determined in Excel with a formula, such as

=LOOKUP(tblAgeBand[@Age],{0,20,40,60},{“under 20″,”20-39″,”40-59″,”60 or over”})

Posted in Power Query | Tagged , , | 3 Comments