Guest DanyHoter Posted August 16, 2023 Posted August 16, 2023 Cross-database and cross-cluster relationships in PBI over ADX tables Summary You may have more than one ADX database and probably more than one ADX cluster. In some cases, you want to join tables or functions from more than one database/cluster. In this article you’ll see how to make sure that such joins are folded and sent to the ADX backend instead of executing at the level of the Power Query mashup engine. Everything mentioned here is applicable to Azure Data Explorer, Synapse Data Explorer, and Fabric RTA. Cross-Database We’ll use the nyc_taxi table in the samples database in the help cluster. | started by creating a query to this table in Power BI desktop, I added a small dimension table to the ContosoSales database called rate_codes. It can be used to add a name to the rate_code in the nyc_taxi table. I created a query for this table in the same PBI model. It is important to use the IsDimension=true setting in the dimension query. Next step is creating a relationship between the two tables based on the rate_code. I created the relationship as 1:m with assume referential integrity. There is a small number of rides with rate codes that do not appear in the dimension table. The number of such rides is very small, and there is a performance advantage in using inner joins. In the attached example crossdatabase.pbix the table takes more than 50% longer to refresh with leftouter join. Out of the box database() notation If you look at the queries generated by PBI using .show queries, you can see that the dimension table which is coming from a different database is mentioned as database('ContosoSales').["Rate_Codes"] This is an automatic behavior by the connector and it allows the join to be constructed in a normal way. Cross-cluster relationships A more complicated case is when the two tables in a relationship are in two different clusters. By default, PBI will not attempt to join the two tables in ADX although it is perfectly possible. Any time that the cluster parameter is not the same in two queries, they are considered as two different sources and no joins will be generated between them. In the attached file crosscluster.pbix, there are two queries with a relationship. When the transactions are filtered by countries, PBI will create a list with all customer keys from the selected countries and create a where clause “where CustomerKey in(11,22,33…)” The list of customers can be very long and affect performance. You can see this behavior in the attached file crosscluster.pbix. First, you need to copy the customers table to your own cluster using: .set Customers <| cluster(“help”).database(‘ContosoSales’).Customers When opening the pbix, it will not work until you change the values of two parameters: After you click OK PBI will connect to each of the two clusters and will not use a join between them. How to trigger a cross cluster join from PBI One way would be to write the query with lookup in Kusto web explorer and export to PBI. SalesFact | lookup kind=inner cluster('Your Cluster').database( 'Your Database').Customers on CustomerKey In the UI it will be Both the fact and the dimension are in the context of the same cluster, but the third parameter includes a fully qualified table including a cluster and a database. Another and simpler way used in the attached file crossfilter1.pbix is to create a query that uses the same cluster and database as fir the fact table, and in the table name dialog use the notation with cluster and database. Again, I used parameters that you can edit and enter your cluster and database. The M query will look like this in the advanced editor let Source = AzureDataExplorer.Contents("Azure Data Explorer","ContosoSales","cluster('" & DimCluster & "').database('" & DimDatabase & "').Customers", [isDimension=true]) in Source Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.