How To Create CDC In Azure Synapse Analytics Using DataFlows

Posted February 9, 20232 yr

Applies To: SQL Server, Azure SQL Database, Azure SQL Managed Instance.

Discover the Power of Real-Time Data Tracking with CDC in Azure Synapse Analytics

Unlock the potential of your data with Change Data Capture (CDC).

This process captures all insert, update, and delete operations performed on your database, making the changes available in near real-time. Keep track of every modification made to your data, allowing your applications to consume and integrate these updates seamlessly.

In this guide, we'll show you how to implement CDC in Azure Synapse Analytics data flows. Follow these simple steps to get started.

Prerequisites:

AdventureWorks sample database from MS (see link below)
Azure Synapse Workspace (see link below on how to create one)
SQL Server Management Studio (SSMS)
Azure Data Lake Gen 2 storage account
Basic knowledge of CDC concept. What is change data capture (CDC)? - SQL Server | Microsoft Learn

Step 1: Build the dataflows

To create a dataflow in Azure Synapse Analytics, follow these steps:

Navigate to the Develop tab.
Click on "Add a Dataflow".
Add a source, such as the Customer table from the Adventureworks database (Make sure to enable CDC see picture attached below).
Save the data to a Data Lake by sinking it as Parquet files.

Enable Change Data Capture in Source:

in SalesLT.Product table, we have 295 rows, Let's try to run few changes to the DB to see how it will affect number of rows.

Unfortunately, the change data capture feature won't be displayed in the Data Preview tab of the dataflow. To check the number of rows, I created a basic dataflow with two main activities.

source is the output of dataflow mentioned in step 1 and sink is a cached sink.

Step 2: Validation of CDC concept

now we will try 2 debug runs:

First debug run:

run the first dataflow where we transform data into parquets.

we can see that all rows in the first run are written successfully.

if we run it again - we expect to see 0 rows written since we didn't change the data:

Now I'm going to update a row using SQL query in SSMS like so:

UPDATE [salesLT].[Product]

SET [Name] = 'sallydabbah'

WHERE [ProductID]= 680

so, we changed one row, in SSMS we can see this row has been changed when we query the table in SSMS:

if we debug run the dataflow again, we should see one row written:

Returning to our second dataflow, we aim to confirm that the number of rows has increased by one, resulting in 296 rows. To verify this, we can click on the "Data Preview" tab in the source of the second dataflow: