Performance Testing Power BI Direct Lake Models Revisited: Ensuring Worst-Case Performance

Two years ago I wrote a detailed post on how to do performance testing for Direct Lake semantic models. In that post I talked about how important it is to run worst-case scenario tests to see how your model performs when there is no model data present in memory, and how it was possible to clear all the data held in memory by doing a full refresh of the semantic model. Recently, however, a long-awaited performance improvement for Direct Lake has been released which means a full semantic model refresh may no longer page all data out of memory – which is great, but which also makes running performance tests a bit more complicated.

First of all, what is this new improvement? It’s called Incremental Framing and you can read about it in the docs here. Basically, instead of clearing all data out of memory when you do a full refresh of a Direct Lake model, the model now checks each Delta table it uses to see whether the data in it has actually changed. If it hasn’t changed then there’s no need to clear any data from that table out of memory. Since there’s a performance overhead to loading data into memory when a query runs this means that you’re less likely to encounter this overhead, and queries (especially for models where the data in some tables changes frequently) will be faster overall. I strongly recommend you to read the entire docs page carefully though, not only because it contains a lot of other useful information, but also because you might be loading data into your lakehouses in a way that prevents this optimisation from working.

Let me show you an example of this by revisiting a demo from a session I’ve done at several user groups and conferences on Power BI model memory usage (there are several recordings of it available, such as this one). Using a Direct Lake semantic model consisting of a single large table with 20 columns containing random numbers, if I use DAX Studio’s Model Metrics feature when there is no data held in memory and with the Direct Lake Behaviour setting in DAX Studio’s Options dialog set to ResidentOnly (to stop Model Metrics from loading data from all columns into memory when it runs):

Then when you run Model Metrics the size of each column in the semantic model is negligible and the Temperature and Last Accessed for all model columns are blank:

The, if I run a query that asks for data from just one column (in this case the column called “1”) from this table like this:

EVALUATE ROW("Test", DISTINCTCOUNT('SourceData'[1]))

Then rerun Model Metrics then the size in memory for that column changes, because of course it has been loaded into memory in order to run the query:

Zooming in on the Model Metrics table columns from the previous screenshot that show the size in memory:

And here are the Temperature and Last Accessed columns from the same screenshot which are no longer blank:

Since the query had to bring the column into memory before it could run, the DAX query took around 5.3 seconds. Running the same query after that, even after using the Clear Cache button in DAX Studio, took about only 0.8 seconds because the data needed for the query was already resident in memory.

OK, so far nothing has changed in terms of behaviour. However if you do a full refresh from the Power BI UI without making any changes to the underlying Delta tables:

And then rerun the Model Metrics, nothing changes and the data is still in memory! As a result the DAX query above still only takes about 0.8 seconds.

So how do you get that worst-case performance again? As mentioned in the docs here, you now need to do a refresh of type clearValues followed by a full refresh. You can’t do a refresh of type clearValues in the Power BI UI though, so the easiest way to do is to use a Fabric notebook and Semantic Link Labs. Here’s how. First install Semantic Link Labs:

%pip install semantic-link-labs

Then use the following code in a notebook cell to do a refresh of type clearValues followed by a full refresh:

import sempy_labs as labs
WorkspaceName = "Insert Workspace Name Here"
SemanticModelName = "Insert Semantic Model Name Here"
# run a refresh of type clearValues first
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="clearValues")
# then a refresh of type full
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="full")

After doing this on my model, Model Metrics shows that the column called “1” that was previously in memory is no longer resident:

…and the query above once again takes 5 seconds to run.

So, as you can see, if you’re doing performance testing of a Direct Lake model you now need to make sure you do a refresh of type clearValues and a full refresh of your model before each test to ensure no data is resident in memory and get worst-case performance readings, in addition to testing performance on a cold cache and a warm cache.

Share this Post

Comments (0)

Leave a comment