Jump to content

Featured Replies

Posted

Applies To: mediumvv2px400.png.1f2ca570818f9162fa11cd18c306b774.pngSQL Server, mediumvv2px400.png.77c086d59e088f0ea9007931b3e12adb.pngAzure SQL Database, mediumvv2px400.png.10b7f24e79a829fe716c9d69fd8d417f.png Azure SQL Managed Instance.

 

 

 

Discover the Power of Real-Time Data Tracking with CDC in Azure Synapse Analytics

 

Unlock the potential of your data with Change Data Capture (CDC).

This process captures all insert, update, and delete operations performed on your database, making the changes available in near real-time. Keep track of every modification made to your data, allowing your applications to consume and integrate these updates seamlessly.

 

In this guide, we'll show you how to implement CDC in Azure Synapse Analytics data flows. Follow these simple steps to get started.

 

 

 

Prerequisites:

 

 

Step 1: Build the dataflows

 

To create a dataflow in Azure Synapse Analytics, follow these steps:

 

  1. Navigate to the Develop tab.
  2. Click on "Add a Dataflow".
  3. Add a source, such as the Customer table from the Adventureworks database (Make sure to enable CDC see picture attached below).
  4. Save the data to a Data Lake by sinking it as Parquet files.

 

mediumvv2px400.png.60b23f8b695a7b848c5ca6c0f09133f3.png

 

 

 

 

 

 

 

mediumvv2px400.png.7f06e7e7444debaa5456a7982951050c.png

 

 

 

 

 

 

 

Enable Change Data Capture in Source:

 

mediumvv2px400.png.b328f362ff527a5ece9e28fea17a1283.png

 

 

 

in SalesLT.Product table, we have 295 rows, Let's try to run few changes to the DB to see how it will affect number of rows.

Unfortunately, the change data capture feature won't be displayed in the Data Preview tab of the dataflow. To check the number of rows, I created a basic dataflow with two main activities.

 

 

mediumvv2px400.png.afadbd68eaa0ddb35d9595888ef5e146.png

 

 

 

 

 

source is the output of dataflow mentioned in step 1 and sink is a cached sink.

 

 

Step 2: Validation of CDC concept

now we will try 2 debug runs:

 

First debug run:

run the first dataflow where we transform data into parquets.

we can see that all rows in the first run are written successfully.

 

mediumvv2px400.png.fc2329f2eceeb1657499992e201b10f3.png

 

 

 

if we run it again - we expect to see 0 rows written since we didn't change the data:

 

mediumvv2px400.png.92e4b52549cc5a98676cf2dce8c5defd.png

 

 

 

Now I'm going to update a row using SQL query in SSMS like so:

 

UPDATE [salesLT].[Product]

SET [Name] = 'sallydabbah'

WHERE [ProductID]= 680

GO

 

so, we changed one row, in SSMS we can see this row has been changed when we query the table in SSMS:

 

mediumvv2px400.png.df23d3c4a0dcf9c566307c0b1ec643c0.png

 

 

 

if we debug run the dataflow again, we should see one row written:

 

 

mediumvv2px400.png.ef15a28aacbb4f9fd0ee640a9d58a4ab.png

 

 

 

Returning to our second dataflow, we aim to confirm that the number of rows has increased by one, resulting in 296 rows. To verify this, we can click on the "Data Preview" tab in the source of the second dataflow:

 

 

mediumvv2px400.png.cd0c4d7c7e081cb1221e841d13c289bd.png

 

 

 

Links:

 

Enable and Disable change data capture - SQL Server | Microsoft Learn

 

AdventureWorks sample databases - SQL Server | Microsoft Learn

 

Quickstart: Get started - create a Synapse workspace - Azure Synapse Analytics | Microsoft Learn

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...