Which Columns Are Loaded Into Memory When In Power BI Direct Lake Mode?

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Test semantic model and methodology

For my tests I loaded the following tables of data into a Fabric Lakehouse:

…and created a Direct Lake custom semantic model that looked like this:

Before every test I refreshed the model to make sure there was no data in memory (be aware that in the future it may be that refreshing a model does not purge all column data from memory). Then, for each test, I ran a single DAX query from DAX Studio and afterwards ran the following query to see whether the dictionaries for each column were resident in memory (again, see my previous post for background):

EVALUATE 
SELECTCOLUMNS(
INFO.STORAGETABLECOLUMNS(),
"TABLE_ID", [TABLE_ID],
"COLUMN_ID", [COLUMN_ID],
"DICTIONARY_ISRESIDENT", [DICTIONARY_ISRESIDENT])

Note that I’m using one of the new DAX Info functions, INFO.STORAGETABLECOLUMNS(), instead of the older DMV syntax I was using last year; I think the DAX Info functions are a lot more convenient to use.

[I also looked at whether column segments were paged into memory using the INFO.STORAGETABLECOLUMNSEGMENTS() but found that its results were consistent with INFO.STORAGETABLECOLUMNS() so I didn’t include its results in this post and assumed that if a column’s dictionary was resident in memory, so were the associated column segments. It’s also worth mentioning that the latest versions of DAX Studio have some great new functionality in the Model Metrics for telling you which columns are resident in memory in a Direct Lake model]

Here’s what the query above returned immediately after a refresh, before any other queries had been run:

The DICTIONARY_ISRESIDENT column tells you whether a column’s dictionary is resident in memory. Each table in a Direct Lake model (as in an Import mode model) has a hidden column called RowNumber, and for a Direct Lake model this column is always resident in memory. As you can see, all other columns are not resident in memory at this point.

Counting rows in a table

For my first test I created a measure that counted the rows in the Sales table:

Order Count = COUNTROWS('Sales')

I then ran a query that returned just the result of this measure:

EVALUATE
CALCULATETABLE(
    ROW(
    "Order Count", [Order Count]
    )
)

I then ran my query to see what had been loaded into memory, and…

…nothing had changed! No new column data had been loaded into memory at all. I assume this is because Power BI can resolve this query using the RowNumber column from the Sales table.

I then added the Order_Status column from the Sales table to the query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count", [Order Count]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query, the Order_Status column was resident in memory as you might expect, but no other column was:

Relationships

Instead of using the Order_Status column, I then looked at the impact of using a column from a different table. I created a query that showed Order Count by Customer Name:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count", [Order Count]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this, the Customer_Name column was in memory along with the two CustomerID columns used in the relationship between the Customer and Sales table:

Any time your query references columns in different tables, the columns used in the relationships between those tables must also be resident in memory.

Measures

I then defined another measure:

Sales Amount = SUM(Sales[Sales_Amount])

And ran the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Sales Amount", [Sales Amount]
)
ORDER BY 
    Sales[Order_Status] ASC

After this, the Order_Status and Sales_Amount columns were resident in memory:

No surprises here: as you would expect, if a column is referenced by a measure then it needs to be resident in memory.

Measures that use relationships also work as expected. I created the following measure:

Order Count Citrus = 
CALCULATE([Order Count], 'Product'[Category]="Citrus")

…and the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Citrus", [Order Count Citrus]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this query, all the columns used in the measure and all the relationships needed by the query and the measure were resident in memory:

I was curious to know what the effect of efficient versus inefficent DAX in measures might be on which columns were loaded into memory though. I created the following two measures to see the impact of the antipattern of filtering on an entire table in the second parameter of Calculate():

Order Count Status X V1 = 
CALCULATE([Order Count], 'Sales'[Order_Status]="X")

Order Count Status X V2 = 
CALCULATE([Order Count], FILTER('Sales', 'Sales'[Order_Status]="X"))

My first query used the first of these measures, the efficient version:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count Status X V1", [Order Count Status X V1]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query the Customer_Name column, the two Customer_ID columns and the Order_Status column were resident in memory:

But what about the second, inefficient version of the measure?

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Status X V2", [Order Count Status X V2]
)
ORDER BY 
    Customer[Customer_Name] ASC

The same columns were resident in memory:

I don’t think this means this version of the measure is any less efficient than it is in Import mode, just that the two versions of the measure need the same columns to be resident in memory to run.

Conclusion

Knowing which columns your query needs to have resident in memory is important for two reasons in Direct Lake mode: it helps understand query performance, because loading column data into memory takes time (Teo Lachev published some test results in a blog post a few days ago which show this clearly); it also helps you understand model memory usage and where you stand relative to the memory limits for the capacity SKU you’re using. As these results show it’s better to test to see which columns your Power BI reports need to have resident in memory rather than rely on guesswork.

It is also possible to use the DISCOVER_CALC_DEPENDENCIES DMV (or the INFO.CALCDEPENDENCY function) to see which tables, columns, measures and relationships are referenced by a DAX query as I described here; I’ve been playing around with this for Direct Lake too, and will report my findings in a future blog post.

Share this Post

Comments (0)

Leave a comment