Introduction
We all tend to avoid accessing super slow websites and apps. Back to the old days of the internet, somewhere in the mid 2000’s, websites that took too long to load were destined to fail. Seeing the spinning wheel running over and over again is boring and drives the attention of the user to the next shining and blinking thing on their screen.
When it comes to the Analytics world, this is no more than true. Nobody likes to break their analytical flow of thoughts because they need to wait for a visual or a page to render in their report. Slowness generates frustration which directly harms the user experience and (why not) the quality of the analysis they want to make. In the very least, this can be one, among many, reason why you still have users extracting data to be better analyzed on their Excel spreadsheets.
The purpose of this article is to be a pool of performance optimization techniques that can be applied to a Power BI Semantic Model and Report. We’re not going to deeply discuss each one of them – which can be a separate article for itself. Here you’ll find some tips to remind what to check when things are not going pretty well on our report.
Hope that you enjoy this set of techniques and if you find one that is missing don’t forget to post in the comments! In the very end, the idea is to have a “live article”.
Breaking down the issues
As we know, on the data world everything depends: depends on your data, your model, your infrastructure, your business rules, your users data literacy, etc. Thus, it’s impossible to carve out in stone what you can do to improve the performance of your data models. The best advice we can give is: test, test and test again if a technique will work better for your scenario.
We can split the performance optimization in three different parts:
- Refresh Performance (Data Transformation & Load)
- Model Performance (Tables, Relationships & DAX)
- Report Performance (Visual)
Refresh Performance
- Guarantee that query folding is happening in all your transformations.
- Try not to use Native Queries. Instead, build views in your database. If not possible, be sure to make all transformations needed on the Native Query itself and avoid any extra Power Query Steps.
- Be careful with tables loading and running twice (Duplicate or Reference on Power Query).
- Check how long the datasource takes to respond to a simple query,
- Try to condense simples steps in a single one (just change once column types, column names, replace values, etc).
- Try to push as much as you can the transformation workload to the source. Avoid doing that in memory – Roche’s Maxim
- Remove unused columns from the tables (specially if you’re doing complex transformations on Power Query)
- Use incremental refresh for large data models (but be aware that you cannot download them from the service later if needed).
- Always use Import Mode. Just use Direct Query if you have a VERY GOOD REASON.
- If Direct Query is used, keep dimensions on Dual-Mode (this will make the slicers to load data faster).
- If importing data from Flat Files, CSV extension will give you the fastest load.
- If using a gateway, try to physically install it next to the data source.
- Disable queries and tables that are not needed on your data model.
- Avoid calculated columns and calculated tables.
Model Performance
Tables & Relationships
- Eliminate implicit date tables and use a central calendar table (from a SQL Server or Power Query build in preference)
- Eliminate bi-directions relationships from the model. Only acceptable in two scenarios: from dimension to bridge table or at the leaf level of a start/snowflake schema.
- Eliminate any Many-to-Many relationships. Don’t use it naively. Only use when you are PRETTY sure of what you’re doing and always on a single direction only.
- Pay attention on the cardinality of the columns. Try to avoid high cardinality (distinct values) on the fact table
- On high cardinality columns, try to use better compression data types.
- Pay attention on the type of the column. For numerical types representing amounts ($) try to use Fixed Decimal number instead of only decimal
- Eliminate unnecessary columns from the model (lighter model in-memory allows more memory usage for calculations)
- Don’t use timestamp type for columns: split column in one for date and another for time (decrease cardinality increasing the compression).
- Try to join dimensions as much as you can and eliminate tables. For flag tables, try to build a Junk Dimension.
- Use a Star Schema or push your data model to this type of schema as much as you can.
- Try to avoid really long Snowflake dimensions.
- Try to reduce as much as you can the number of columns on the fact table.
- If memory space is needed, disable the feature “IsAvailableInMDX” to eliminate the hierarchy storage. Makes sense when this hierarchy has almost same size as the dictionary. Pay attention this is experimental feature and not supported (make a backup first)
- Use the VertiPaq Analyzer embedded on DAX Studio to verify all these conditions.
DAX
- Use and Abuse of the Performance Analyzer: try to identify which query is generating more overload on your model’s performance. If “Others” is the higher number, this means that you might have too many visuals on your page – rendering time taking too long. Check the next section of “Report Performance”
- Use DAX Studio to understand the DAX Query, Query Plan, the Server Timings (SE and FE processing) and defined a benchmark for the Query you are analyzing.
- Check the balance between Formula Engine (FE) and Storage Engine (SE). Try to push as much execution as you can to the SE (multi-thread) instead of heavily depending on FE (single-thread).
- Check for queries scanning too many times the fact table.
- Always apply and remove filters over Columns and not entire tables (avoid scanning/hitting the Expanded Table)
- Check for measures being used many times within another measure. Try to use variables when possible.
- Be careful with Context Transition, specially when referencing within iterators other measures.
- When making IF Statements, remember to decide between EAGER or STRICT calculation.
- Remove “CallBackIDs” showed on the Query Plan.
- Trick: when needed to use IF statements within a Iterator, use the DIVIDE(MEASURE, CONDITION) to avoid CallBackIDs.
- Be careful with the FILTER function: remember it’s an iterator which will scan your tables. Try to move filters to CALCULATE arguments
- Tip: when possible, try to breakdown your filter conditions into two (or more) calculates, sum them as result.
- Expectation: fact table of 1.5 million rows running queries at 10-20ms
Report Performance
- The number of visuals and elements on a Power BI Page directly affects the performance of a report. We can check how much the “Other” category takes to process on the Performance Analyzer tab. Remember: Power BI creates a queue for the visuals queries and rendering. Each visual shoots a query against the Engine.
- On good way of decreasing the number of visuals is to combine measure into another visual type, like matrixes or multi-row cards. This way, only one DAX query will be sent to the Engine, enabling “DAX Fusion” to quick in.
- Check the size of the images loaded as background on the report. They can affect as well the report page refresh.
- Avoid long pages with lots of visuals. Leverage the existence of bookmarks, page navigation buttons, tooltips and drill-through.
Conclusion
Not only checking for these techniques, you can also go to the Microsoft Documentation, where they have a GREAT article about Performance Optimization!
Also, Tabular Editor has a great built-on tool (BPA – Best Practices Analyzer) which you can run against you data model to automatically check for not only common best practices but also any particular one that you can programmatically set. Worth checking it out.
Well well well, that’s all for today folks! This is my go-to list of things to be done when I’m looking for optimizing the performance of a Power BI Report.
This list is not final and my idea is to keep adding useful tips to it! By the way, if you have any that I missed, put in the comments below!
Cheers!