6 Keys to Power BI Success
A good business intelligence design tool makes learning a data set easy, engages a user’s curiosity, and empowers them to ask questions. For analysts, curiosity and learning come hand in hand, a more sophisticated question demands a more sophisticated analysis and answer, often leading to a new learning and an even more sophisticated question.
As a millennial, I’m part of the generation that’s grown up alongside many data analysis, query language, and visualization tools. In high school I learned spreadsheets and my first coding language (HTML), in library school I learned about databases and information organization, then as an analyst I learned about pivot tables, T-SQL, and visualizing data (before Tableau started using year-based versioning numbers). My learning journey was gradual, matching the pace at which new functionality was released in these tools. So, I didn’t expect that Power BI would require me to learn a whole new approach to how I work with data. In Power BI, I found my SQL knowledge wasn’t useful (instead I had to learn 2 new languages), I shouldn’t get too creative with visuals (avoid customizations), to lean on Pat heavily to get the data structure just-so, and had to limit my expectations of exploring the data.
After one year of working with Power BI, here’s what I recommend you should know before starting your first Power BI project.
6 Keys to Being Successful with Power BI
- Map out your dashboards and visualizations before you start. Stick to Power BI's core visuals.
Sketching out your dashboard is good practice anyway, but unlike some BI tools which encourage data exploration and creative or custom chart building, Power BI doesn’t make this easy. This is especially true if you’ve got more than one Fact table or multiple levels of aggregation, or if you are working with survey data and want to build the holy grail of Likert visualizations (stacked diverging bar charts) which require multiple calculated measures in any tool, but Power BI makes basic deisgn elements like sorting the bars and segments, showing/wrapping long text fields, masking low n’s, and organizing sentiment labels in the same order as your bar segments, unnecessarily complex. You can do sketches by hand or use a framework to accelerate the process such as Nudge.BI.
- A star schema data source is non-optional.
Power BI performs best with denormalized data organized into a minimal number of Fact tables (the star centre) and Dimension tables (star points or spokes). Tables are only allowed one active join to another, which can mean you need bridging tables or compound keys. Use clear and obvious foreign keys. Set up your tables with database ids as primary keys (not program names or student ids). This makes doing any row iterations, in-measure filtering, or calculations that reference across tables much easier to write and read when you go back to the workbook in the next cycle. If you’re working with a properly designed star schema, you’re probably already doing this. While it is possible to set up inactive relationships between tables and reference them in DAX calculations, this makes calculated measures much more complicated. The simpler your star schema, the easier it will be to create measures and apply filters and interactions accurately.
We've helped many institutions get their data "just-so" for reporting and dashboarding.
The more complex your schema, the more you’re going to have to include specific filter functions into any aggregation or row iteration calculation you write in DAX. And this is why….
- Give yourself or your team plenty of time to develop their first iterations of a dashboard.
They’re likely learning two new languages: DAX for visualizations, and M for PowerQuery (if they have to perform data transformations not accounted for in the data source), as well as navigating a new tool with a ton of hidden menus. While DAX initially feels like writing Excel functions, if you’re working with a large schema and multiple fact tables, you’ll quickly find you need new language and logic to create accurate calculations. While you can do a lot of things with DAX and Power BI, it is going to take a lot of time if you want to hack or customize any standard visuals, have those visuals show tool tips with non-default and correct values (e.g.: N of totalN on percentage charts), or applying masking that works differently for different security groups.
You might be tempted to sign up you or your team for a Power BI course, but if your institution is migrating from one BI tool to another, the ROI of doing standard Power BI courses is limited. Most courses are built on sales or financial data, and we all know higher ed data is significantly different, much more complex, and never as clean as these ready-made datasets. The best way to learn Power BI is in context with your own data and this requires time.
- Groups should set up on the data source side.
The number of groups allowed on a dimension field in Power BI is limited, and the pop-up window to create them in is small, requires a lot of scrolling, and cuts off text so it’s hard to see all of what you have going on. This isn’t uncommon in the BI tools, but it’s more of a pronounced annoyance here than other tools.
- Sort tables need to be built for any dimension or group that requires a non-alphanumeric order on your visualization.
Sorting in Power BI requires a table with no duplicates on the id or the field intended to sort by – even if your visualization has unique labels or dimensions visible, it always looks at the underlying data for the sorting unless you’re sorting based on the axis values (ascending/descending). Therefore, each dimension must have its own sort table (especially for Term where the title or code often doesn’t make sense to sort alphanumerically, or Likert values for survey data, for the same reason). A sort table will include the dimension you want to sort by and the sort order value (usually numeric). For example, Power BI will automatically sort terms in this order: Fall, Spring, Summer, Winter. But in our academic calendar, the sort order actually is Fall, Winter, Spring, Summer, so we need to have a sorting table that defines this. Similarly, Likert values will be automatically sorted like Dissatisfied, Neutral, Satisfied, Very Dissatisfied, Very Satisfied, instead of Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Dissatisfied. For surveys, this means every different Likert scale set in the survey requires its own sort table.
- Get your governance, naming conventions, and change documentation processes for the dataset clear and organized.
Because of the complex nature of DAX calculations, stepping into an existing Power BI project isn’t easy. Documenting the data source to DAX to design workflow, the rules governing each field or metric used, and having table names and fields organized with a standard naming convention, makes working and onboarding to Power BI much easier (arguably true for all BI tools, but especially so for Power BI). If your analytics team includes a data source developer, working closely with them at the outset of a new analysis project to write this documentation will help clarify data source structures, purpose, and your visualization design approach. Understanding the underlying data structure is key to effectively manipulating it in PowerQuery or DAX.
- (Bonus!) Working with survey data from multiple years is not intuitive, because multiple years of results are not typically stored as a star schema.
If you don’t have a survey database designed with long instead of wide data, with crosswalk tables to control question and label changes, and tables that identify cohorts and respondents for survey years, then build it now, well before your end users ask for the latest results. Build it early and test it with subsets of historical survey, test for performance and calculated measure accuracy, then bring in the historical dataset in full, test again, then bring in the new data. (While you can append data tables in PowerQuery, this is going to bog down your refresh times, and it’s better done on the database side before importing the dataset).