Microsoft Fabric Metadata Driven Pipelines with Mirrored Databases

  • Thread starter Thread starter jehayes
  • Start date Start date
J

jehayes

Microsoft Fabric's database mirroring feature is a game changer for organizations using Azure SQL DB, Azure Cosmos DB or Snowflake in their cloud environment! Mirroring offers near real-time replication with just a few clicks AND for at no cost!



Features:

  • No data movement cost when mirroring
  • No storage cost for mirrored tables
  • No consumption of Fabric Capacity Units
  • Mirror all tables in your source database or just a few, with the capability to add more tables as your Fabric analytics environment grows
  • Source data continuously replicated with no data pipelines to configure
  • Data landed in delta tables in One Lake which are optimized by default
  • SQL endpoint and default semantic model automatically created

What can you do?

  • Run near real-time queries against the SQL endpoint with no impact to your source system since the data is replicated
  • Share the mirrored database across your Fabric tenant
  • Run cross database queries from within the mirrored database SQL endpoint or from One Lake when the mirrored database is shared
  • Create SQL views over mirrored data, which can include joins or unions with data from other mirrored databases, warehouses or lakehouse SQL endpoints
  • Configure row-level security and object level security against the mirrored data
  • Create Direct Lake near real-time Power BI reports against the default semantic model or against a new model
  • Copy mirrored data into a lakehouse and used in Spark notebooks or Data Science workloads
  • And… build faster, simpler metadata driven pipelines! Which is the focus of this article!

Metadata Driven Pipelines with Microsoft Fabric Mirrored Databases​


Metadata-driven pipelines in Microsoft Fabric enable you to streamline data ingestion and transformations with minimal coding, lower maintenance, and enhanced scalability. And when your source is Azure SQL DB, Azure Cosmos, or Snowflake, this becomes even easier!



Architecture Overview​


Fabric Metadata with Mirroring.jpg



  1. With Mirroring, the data is replicated and readily available in OneLake. This eliminates the pipeline which brings data into Fabric. After mirroring is configured:
    1. A SQL endpoint is automatically created, allowing you and your users to query the data in near real time without impacting source data performance
    2. A default semantic model is also automatically created, allowing you and your users to create near real time Power BI reports. However, best practice is to create a new semantic model, which provides more features and options than the default semantic model
  2. A Fabric Data Warehouse is created to contain:
    1. The fact and dimension tables (star schema) which simplifies and optimizes the analytical data model for SQL querying and semantic models
    2. SQL views and stored procedures used in data transformations
    3. The metadata table which holds the information on how to transform load each fact or dimension table
  3. The Fabric Data Pipelines are created and scheduled to perform:
    1. A Lookup activity on the metadata table to get the information on how to load each fact or dimension table in the Fabric Data Warehouse
    2. Copy Data activities for full loads with the source being a SQL view over the mirrored tables and the destination a table in the Fabric Data Warehouse
    3. Stored Procedure activities for incremental loads to merge the latest data from the mirrored data source into the Data Warehouse destination table
  4. A default semantic model is automatically created over the Fabric Data Warehouse. But create a new one per best practices
  5. Build Power BI reports for analytical reporting

Why 2 semantic models? Fabric Data Warehouse vs Mirrored Database SQL Endpoint for semantic model reporting​


For this architecture, I considered using:

  • Just the mirrored tables
  • SQL views over the mirrored tables
  • A new Fabric Data Warehouse with data loaded from the mirrored tables

Since SQL views over mirrored tables always resort to Direct Query rather than Direct Lake, I decided against building a semantic model over SQL views. Instead, I created a semantic model over the mirrored tables and a separate Fabric Data Warehouse and semantic mode over it.



The semantic model over the mirrored tables:

  • Is used only by users who understand the source data schema
  • Allows for near-real time access for data to answer ad-hoc questions that are not analytical in nature such as the order shipment status of a particular order number or current stock availability of a specific product at a certain location
  • Incur no data storage cost but could be offset by consumption of Capacity Units if reports/queries are complex
  • Can leverage Power BI Direct Lake connection for faster report performance if the report is not too complex; if the query is too complex, it will resort to Direct Query
  • Could include other lakehouses, mirrored databases, or data warehouse tables in the SQL endpoint and thus the semantic model but Power BI reports using these tables will always resort to direct query rather than direct lake connect
  • Can include complex relationships between tables and unexpected results may be returned if the semantic model and/or reports are not configured correctly

The semantic model over the Fabric Data Warehouse:

  • Requires scheduled data refreshes but will be relatively fast since the source data is already in Fabric
  • Is best for more analytical questions such as "What was our sales revenue by month by customer location?" or "What is our days on hand for products in a particular shipping warehouse?"
  • Allows for user friendly data warehouse table and column names rather than using the potentially cryptic mirrored database table and column names
  • Eliminates snowflake schema, allowing for better performing reports and delivery of consistent results without having to understand complex relationships and filtering rules
  • Is more likely to leverage Direct Lake connection in Power BI reports since model is simpler
  • Allows other data sources to be loaded into the same warehouse, eliminating cross database joins in reports that automatically resort to direct query
  • Leverages a simpler star schema model resulting in faster reports with less consumption of capacity units



This solution addresses two key use cases: providing near real-time responses to queries about specific data transactions and statuses, and delivering rapid analytics over large datasets. Continue reading to learn how to implement this architecture in your own environment.

Solution details​


Below are detailed steps to build the metadata pipeline. The data source is the Wide World Importers SQL database, which you can download here. Then follow the instructions to import into an Azure SQL DB.

  1. Configure database mirroring

From the Synapse Data Warehouse experience, choose the Mirrored Azure SQL DB Option:​

jehayes_0-1724454100023.png



Then choose the tables to mirror:​

jehayes_1-1724454100027.png


After the mirroring has started, the main canvas will say “Mirrored Azure SQL Database is running’; Click on Monitor replication to see the number of rows replicated and the last completion time:​

jehayes_2-1724454100035.png


At this point, both a SQL analytics endpoint and default semantic model are created (1a). But I created a new semantic model(1b) and set up the table relationships:​

jehayes_3-1724454100046.png2. Create a Fabric Data Warehouse​


jehayes_4-1724454268894.png


Create the fact tables or any other tables that will be incrementally loaded (2a). You can also manually create the dimension tables or tables are full loaded OR you can specify to auto-create the tables in the pipeline Copy Data activity, as I do later.​

jehayes_5-1724454268901.png


Create the views over the mirrored database tables and stored procedures (2b):​

jehayes_6-1724454268908.png


Create and load the metadata table (2c) with information on how to load each fact or dimension table:​

jehayes_7-1724454268913.png

3. Create the data pipelines to load the fact and dimension tables​

Below is the orchestrator pipeline:​

jehayes_0-1724454627347.png


Set variable – set pipelinestarttime to the current date/time. This is logged in the metadata driven pipeline table for each table​

jehayes_0-1724707614740.png




Lookup – get the table load attributes from metadata table for each table to load​

jehayes_2-1724454627350.png



For each table to load​

jehayes_3-1724454627356.png


Invoke the pipeline, passing in the current row object and the date/time the orchestrator pipeline started:​

jehayes_4-1724454627358.png


Load warehouse table pipeline

jehayes_5-1724454627360.png


Set variable pipeline start time for tracking the time of each table load​

jehayes_6-1724454627361.png


If activity – check if full or incremental load​

jehayes_7-1724454627362.png


If full load

Use Copy Data Activity to load the Data Warehouse table, set the pipeline end time variable and update the metadata table with load information​

jehayes_0-1724682468259.png

Copy data activity

Source settings reference the view over the mirrored database:​

jehayes_1-1724682575491.png

Destination settings reference data warehouse table:​

jehayes_2-1724682654854.png

Note that the data warehouse table will be dropped and re-created each time​


Set the pipeline end time variable

jehayes_0-1724682775048.png


Run Script to update the pipeline run details for this table​

jehayes_1-1724682775053.png



If not a full load, then run the incremental load activities​

jehayes_0-1724699493059.png



Lookup activity calls a stored procedure to insert or update new or changed records into the destination table. The value for the StartDate parameter is the latest date of the previous load of this table. The value for the EndDate parameter is usually a null value and only set if there is a need to load or reload a subset of data.​

jehayes_1-1724699681354.png



The stored procedure performs an insert or update, depending upon whether or not the key value exists in the destination. Only the records from the source that have changed since the last table load are selected. This reduces the number of updates performed.​

jehayes_2-1724699745898.png

The stored procedure returns how many rows were inserted or updated, along with the latest transaction data of the data loaded, which is needed for the next incremental load.​


Set the pipeline end time​

jehayes_0-1724700001035.png


Script activity updates the table load details:​

jehayes_1-1724700001040.png



4. Build a new semantic model in the Fabric/Power BI service​

Create relationships between the tables, DAX calculations, dimension hierarchies, display formats – anything you need for your analytics​

jehayes_0-1724700290670.png

Note that all tables have Direct Lake connectivity as noted by the dashed, blue line. Direct Lake has the performance of Import semantic models without the overhead of refreshing the data.​


5. Create reports​

Create reports from your semantic model​

jehayes_1-1724700438439.png



Continue on building out more reports and dashboards, setting up security, scheduling data warehouse refresh (which will now be super fast since the source data is already in Fabric), creating apps, adding more data sources – whatever it takes to get the analytics your organization needs into Fabric!



Mirroring - Microsoft Fabric | Microsoft Learn

What is data warehousing in Microsoft Fabric? - Microsoft Fabric | Microsoft Learn

Data Factory in Microsoft Fabric documentation - Microsoft Fabric | Microsoft Learn

Work with semantic models in Microsoft Fabric - Training | Microsoft Learn

Create reports in the Power BI - Microsoft Fabric | Microsoft Learn

Dimensional modeling in Microsoft Fabric Warehouse - Microsoft Fabric | Microsoft Learn



If your source database is not supported for mirroring (yet!), check out these other articles I wrote:

Metadata Driven Pipelines for Microsoft Fabric - Microsoft Community Hub

Metadata Driven Pipelines for Microsoft Fabric – Part 2, Data Warehouse Style - Microsoft Community Hub

Continue reading...
 
Back
Top