Monitoring The DAX Queries Generated When The Power BI Copilot Index Is Built

In my last post I talked about the text index that Power BI Copilot builds to help it answer data questions. You might be wondering if you can monitor the index build process and the bad news is that – at least at the time of writing – you can’t do so directly. However you can monitor it indirectly because the index build process runs DAX queries to get text values from the semantic model and you can see the DAX queries being run using Workspace Monitoring, Log Analytics or Profiler. While this is described quite well in the docs here let’s see what these DAX queries actually look like.

I published the semantic model from my last post (which has four text columns CustomerId and CustomerName from the Customers table, CustomerId and TransactionId from the Orders table, none of which are hidden or excluded from indexing using the Simplify The Data Schema feature) to a workspace where Workspace Monitoring was enabled:

I then ran the following KQL query to look at the DAX queries run in the workspace I published to:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName == "QueryEnd"
| project Timestamp, EventText, DurationMs, CpuTimeMs

I saw that the index build generated three DAX queries with the same pattern:

Here’s the first of these queries:

EVALUATE
SELECTCOLUMNS (
    FILTER (
        VALUES ( 'Customers'[CustomerName] ),
        LEN ( 'Customers'[CustomerName] ) <= 100
    ),
    "valueColumn", 'Customers'[CustomerName]
)

In line with what is described in the documentation this query gets all the distinct text values from the CustomerName column on the Customer table that are less than 100 characters long. I assume there are only three DAX queries even though there are four text columns in the model because one of the columns, TransactionId, has 5 million distinct values in it and Copilot cannot currently index more than 5 million text values in a single model.

[Interestingly, I saw that one of these queries failed with a memory error which I have asked to be investigated]

Each query took between 300-600ms and there was a small gap between queries, so you can see how a large model with a lot of text columns could generate a lot of queries that all together take a noticeable amount of time to run. Finding the amount of time between the start of the first of these queries and the end of the last query will give you a rough idea of how long it takes to build the index, even though there is some extra work that needs to be done after the last query has been run which can sometimes take some time too.

As described here, indexing takes place for Import mode models with Q&A enabled every time the model is published, as well as every time it is refreshed so long as either Copilot or Q&A has been used in the last 14 days. For DirectQuery or Direct Lake models indexing also takes place every time the model is published but because it’s impossible to say whether the underlying data has changed when a refresh takes place, indexing happens instead every 24 hours so long as either Copilot or Q&A have been used in the last 14 days. As a result, if you’re an admin, you might see a lot of DAX queries similar to the one above being run.

[Thanks to Aaron Meyers for much of the information in this post]

Share this Post

Comments (0)

Leave a comment