Understanding The “Copilot Analyzed Only Part Of The Model Due To Its Size” Warning In Power BI Copilot

If you’re using Power BI Copilot you may have seen the following warning message:

Copilot analyzed only part of the model due to its size, which may affect these results

What does it mean? What causes it? What can you do to avoid it? Let’s find out.

In order to answer your questions better Power BI Copilot creates an index of the values in all the text columns in your semantic model that it thinks you’re likely to ask questions about. This indexing process is described in the docs here in great detail and I recommend you read the docs before carrying on (incidentally, another common warning message, “Copilot is currently syncing with the data model” that you might have seen is displayed while that indexing is taking place). There is a limit on the number of text values that can be indexed and when Copilot hits that limit you see the “Copilot analyzed only part of the model due to its size” warning.

At the time of writing (and this may change in the future) there are two limits on index size in place which are documented here:

  1. Copilot can only index 1000 text columns – but if you have 1000 columns in your model I would argue you’re already doing something wrong, because a model that large is going to be very difficult to use.
  2. Copilot can only index up to 5 million distinct text values across all text columns – and this is the limit you’re most likely to hit if you’re not careful.

Only text values that are less than 100 characters are indexed; longer text values are ignored and do not cause the warning to appear.

For an example of how you can run into the 5 million distinct text values limit, consider the following simple semantic model that I have designed so that the warning message appears:

Here’s what the contents of the Customers table looks like (the CustomerName column contains random text values):

Here’s what the contents of the Orders table looks like:

The Customers dimension table has 1 million rows in it; the Orders fact table has 5 million rows in it. If you ask Copilot a question about this semantic model like:

Show the Sales Value for the Customer Name WAMUQT

…it may work well but you will see the “Copilot analyzed only part of the model” message:

In other cases you might find that Copilot falls back to generating a DAX query or asks for clarification because, if it can’t find a text value you’ve used in your prompt in its index, it’s less sure about how it can answer the question:

In these cases you may find Copilot’s responses are less reliable, even if you still get correct results in many cases.

There are a couple of flaws in the design of this semantic model that either individually or combined cause that warning message to appear. Remember that it only appears when Copilot thinks it needs to index more than 5 million text values, so what is causing it to go over that limit?

First of all I deliberately made all the key columns (CustomerId on Customers and Orders, TransactionId on Orders) text columns. In this case they can easily be set to be of type Whole Number and the problem will be solved because the only remaining text column, CustomerName, contains 906925 distinct values which is well under the limit. It’s a best practice to use integer keys in Power BI whenever you can anyway. What if you can’t change the data type of your key columns though?

Well the second thing to consider is whether you need the TransactionId column on the fact table at all – primary keys on a fact table are rarely useful and can be very expensive in terms of model size, and in this case there are 5 million distinct values in that column so removing it completely will have a number of other benefits apart from reducing the overall number of text values that need to be indexed.

Thirdly, though, why are these key columns even visible? The CustomerId columns are almost certainly not useful to end users and TransactionId might not be either. Setting all these columns’ Is Hidden property to true means that Copilot does not try to index them, which means that only the CustomerName column is indexed and the warning message goes away. Again, hiding key columns is a best practice regardless of whether you’re using Copilot.

Finally, if you don’t want to hide a text column, you have another option: you can use the Simplify The Data Schema page in the Prep Data For AI dialog to prevent Copilot considering it in its responses. As I said, hidden columns are never considered anyway, but deselecting a column in this page has the same effect:

Here’s Copilot successfully answering the question above without the warning after deselecting the key columns in the Simplify The Data Schema page:

For most Power BI semantic models you should be able to use one or more of these techniques to make sure only the text columns that need to be indexed are indexed, stop the warning message from appearing, and get more reliable results from Copilot.

Share this Post

Comments (0)

Leave a comment