First Look At Fabric Graph: Analysing Power BI Import Mode Refresh Job Graphs

The new Fabric Graph database is now rolling out now and should be available to everyone within the next few weeks if you can’t see it already. The key to learning a new data-related technology is, I think, to have some sample data that you’re interested in analysing. But if you’re a Power BI person why would a graph database be useful or interesting? Actually I can think of two scenarios: analysing dependencies between DAX calculations and the tables and columns they reference using the data returned by INFO.CALCDEPENDENCY function (see here for more details on what this function does); and the subject of this blog post, namely analysing Import mode refresh job graphs.

I’m sure even some of the most experienced Power BI developers reading this are now wondering what an Import mode refresh job graph is, so let me remind you a series of three posts I wrote early in 2024 on extracting the job graph events from a refresh using Semantic Link Labs and visualising them, understanding the concepts of blocking and waiting in an Import mode refresh, and crucially for this post how to save job graph information to a table in OneLake. Here’s a quick explanation of what a refresh job graph is using the model from the second post though. Let’s say you have an Import mode semantic model consisting of the following three tables:

X and Y are tables that contain a single numeric column. XYUnion is a calculated table which unions the tables X and Y and has the following DAX definition:

XYUnion = UNION(X,Y)

If you refresh this semantic model the following happens:

  1. The Power BI engine creates a job to refresh the semantic model
  2. This in turn kicks off two jobs to refresh the tables X and Y
  3. Refreshing table X kicks off jobs to refresh the partitions in table X and the attribute hierarchies in table X
  4. Refreshing table Y kicks off jobs to refresh the partitions in table Y and the attribute hierarchies in table Y
  5. Once both table X and Y have been refreshed the calculated table XYUnion can be refreshed, which in turn kicks off jobs to refresh the attribute hierarchies in table XYUnion

So you can see that refreshing an Import mode model results in the creation of refresh jobs for individual objects which have a complex chain of dependencies between them. If you want to tune an Import mode model refresh then understanding this chain of dependencies can be really useful. Running a Profiler trace while a refresh is happening and capturing the Job Graph trace events gives you all the data needed to do this.

I refreshed the model shown above and saved the Job Graph data for it to two tables in OneLake using the code in this post. The first table was called RefreshJobs and contained one row for each job created during the refresh:

The second table contained all the dependencies between the jobs and was called Links:

I then created a new Graph model in my workspace, clicked Get Data, selected the lakehouse where the two tables above were stored, selected those two tables:

…and clicked Load. Then in the model editor I clicked Add Node and created a node called RefreshJobs from the RefreshJobs table:

And then I clicked Add Edge and created an edge called DependsOn from the Links table:

Next I clicked Save to load all the data into the graph model (this very similar refreshing a semantic model).

This resulted in a simple graph model which represented the recursive relationship between the jobs in a refresh:

I was then able to use the query builder to create a diagram showing all the dependencies between the jobs:

It’s not as pretty as the tools for viewing DGML files I showed in this post last year but it has the advantage of allowing you to filter on the properties of nodes and edges. If you know GQL (and I wrote my first GQL query all of two hours ago…) you can write queries to do much more advanced types of analysis. Here’s a GQL query I managed to write which returns all the jobs that depend on the job with the JobId 1, and all the jobs that depend on those jobs:

MATCH 
(source_Job:Job)-[DependsOn:DependsOn]->{1,2}(target_Job:Job)
WHERE target_Job.jobId=1
RETURN 
target_Job.jobId,
target_Job.description,
source_Job.jobId,
source_Job.description

This is really cool stuff and in particular I would love to learn a bit more GQL to understand how the dependencies between objects and the amount of parallelism possible during a refresh affect refresh performance. If I get the time to do so I’ll write more blog posts!

Share this Post

Comments (0)

Leave a comment