Setting Up CDC for Seamless Operations in Azure SQL Managed Instances with Failover Groups

diegoc · Jul 16, 2024

Customers leverage Change Data Capture (CDC) with Azure SQL Managed Instance in various configurations. Although CDC operates smoothly after a local failover to a secondary replica in SQL Managed Instance Business Critical, it is not designed to function seamlessly after a geo failover within a Failover Group configuration. This applies to both General Purpose and Business Critical tiers. This step-by-step guide will assist you in configuring CDC to work seamlessly with a Failover Group configuration, ensuring that changes continue to be captured after a geo failover.

Prerequisites:

Two SQL MI instances configured as failover group partners.
A user database to be configured to capture changes, MyDB in this guide.

Login to the primary replica using SQL Server Management Studio (SSMS).
If CDC is not yet enabled on the primary instance target database, use the following command to configure the database for CDC. Ensure to replace MyDB with the name of the target database.

Code:

USE [MyDB]
EXEC sys.sp_cdc_enable_db
GO

Now that CDC is enabled at the database level, enable CDC on the specific tables to track. Use the following T-SQL code to add a table and replace mytable with your table name:

Code:

USE [MyDB]
EXEC sys.sp_cdc_enable_table 
@source_schema = N'dbo', 
@source_name   = N'mytable', 
@role_name     = NULL, 
@supports_net_changes = 1 
GO

For a complete explanation of the above two TSQL statements please refer to the Microsoft documentation: Enable and Disable change data capture - SQL Server | Microsoft Learn

This step will create two SQL Agent jobs with names that contain the database name as can be seen below:

After enabling CDC on your database and tracking changes on tables, we will modify the default CDC configuration to make it failover group aware for seamless data capture after each failover.

Add a new step named Verify if primary replica to the capture job that will verify the current replica is the primary. This is required for ensuring the jobs only run on the primary replica, preventing unnecessary job failures or attempts to run CDC operations on the geo-secondary replica. Add the following T-SQL code as the new step command, replacing MyDB with your actual database name:

IF DATABASEPROPERTYEX('MyDB', 'Updateability') <> 'READ_WRITE' THROW 50000, 'This is not the primary replica.', 1;

The new step will look like this:

Configure the new step to Quit the job reporting success in the On failure action of the Advanced page: Adjust the new step's settings so that, if it fails due to the error thrown by the check (indicating the replica is not the primary), it quits reporting success. This change is necessary because the job isn't supposed to be executed on non-primary replicas, and the "failure" is by design, not due to an actual error in job execution.

Note: Even though the job will quit reporting success every minute, the step itself will fail when running on the geo-secondary and the job history will contain entries like the ones shown below:

This is expected and monitoring may need to be adjusted accordingly to not alert in this specific step failure. During failover events, jobs may also fail due to the transient unavailability of the database and monitoring will need to be adjusted to contemplate these scenarios.

Configure the capture Job to start on step 1:Verify is primary replica. Move the new step Verify is primary replica to the top of the steps list, and configure the Start step to the be the new step. At this point, the steps of the capture job should look like this:

Repeat the same steps 4 through 6 above for the Cleanup Job. Remember to replace MyDB with your database name accordingly. The steps of the cleanup job should look like this:

Change the Retry attempts on the Change Data Capture Collection Agent to 0: This prevents the job from retrying after a failover, as the job's continuation will be managed by the new primary replica. The change should look like this:

Add a new schedule to the capture job which attempts to start the job every minute. This is needed so that CDC begin capturing as soon as possible on the new primary replica after a failover event. Name it Start if primary replica. This new schedule should look like this:

Important note: If for any troubleshooting reason the capture job needs to be stopped for a certain amount of time, the way to accomplish it in the failover group will be to disable the Start if primary replica schedule in both primary and geo-secondary replicas by unchecking the Enabled box shown above. To resume CDC capture, enable the schedule again in both places.

Remove the original schedule named CDC Capture agent schedule to prevent the Capture Job from running when each replica starts.

At this point, the capture job schedules should look like this:

Script out both CDC jobs on the primary replica (Capture and Cleanup jobs) so they can be created on the geo-secondary later. This can be done by expanding the SQL Server Agent node in Object Explorer and scripting out the job as seen in the screenshot below:

Capture the content of cdc_jobs table on the primary to populate later the geo-secondary. Replace MyDB with your database name, execute the T-SQL code below on the primary replica, and copy and save for later the INSERT statement generated in the results:

Code:

DECLARE @Database NVARCHAR(255) = 'MyDB'/*<<Your DATABASE NAME goes here*/
, @Fields NVARCHAR(200) = '[job_type], [job_id], [maxtrans], [maxscans], [continuous], [pollinginterval], [retention], [threshold]'
 , @SQL NVARCHAR(1000);
SET @SQL = 'SET NOCOUNT ON;
DECLARE @Insert VARCHAR(MAX);
SELECT @Insert = ISNULL(@Insert + '' UNION '', ''INSERT INTO msdb.dbo.cdc_jobs([database_id], '
 + @Fields + ')'') + CHAR(13) + CHAR(10) + ''SELECT DB_ID('''''  + @Database + '''''),'' + '
+ REPLACE(REPLACE(REPLACE(@Fields, ',', ' + '', '' + '), '[', ''''''''' + CAST('),']',' AS VARCHAR(max)) + ''''''''')
 + ' FROM msdb.dbo.cdc_jobs WHERE database_id = DB_ID(''' + @Database + ''');
PRINT @Insert';
EXEC sp_executesql @SQL;

Below is an example of executing the T-SQL code. Note the INSERT INTO statement under Messages:

Connect to the geo-secondary and run the following query to confirm CDC has been enabled on the geo-secondary as well. The column is_cdc_enabled should be 1 for your database:

SELECT name, is_cdc_enabled FROM sys.databases;

Create both the CDC SQL Agent jobs on the geo-secondary using the scripts generated on the primary replica previously in step 11. Connect to the geo-secondary replica and:
1. Run the script to create the capture job.
2. Run the script to create the cleanup job.
3. Verify both jobs are listed under SQL Server Agent - Jobs in SSMS.

Note: If you ever remove CDC from these failover group replicas, do it on the primary replica and delete the two jobs manually on the geo-secondary since sp_cdc_disable_db will not remove the jobs on the geo-secondary replica.

Create the cdc_jobs table in the geo-secondary MSDB database. Execute the following T-SQL code, replacing MyDB with your actual database name to create this table on the geo-secondary:

Code:

USE [msdb]
GO
CREATE TABLE [dbo].[cdc_jobs] (
[database_id] [int] NOT NULL
, [job_type] [nvarchar](20) NOT NULL
, [job_id] [uniqueidentifier] NULL
, [maxtrans] [int] NULL
, [maxscans] [int] NULL
, [continuous] [bit] NULL
, [pollinginterval] [bigint] NULL
, [retention] [bigint] NULL
, [threshold] [bigint] NULL
, PRIMARY KEY CLUSTERED ([database_id] ASC, [job_type] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY];

Insert in the cdc_jobs table on the geo-secondary the data captured on the primary. Execute the INSERT statement captured in the results of step 12 on the MSDB database of the geo-secondary and verify that the cdc_jobs table content is the same on both replicas. An example of the content of the cdc_jobs table after inserting the rows can be seen below.

Update the cdc_jobs table in the MSDB on the geo-secondary to reflect the correct job IDs. Run the following T-SQL code on the geo-secondary to update cdc_jobs table so it contains the job identifications of the jobs on the geo-secondary instead of the primary. Remember to replace MyDB with your actual database name:

Code:

UPDATE c
SET c.job_id = s.job_id
FROM msdb.dbo.sysjobs s
JOIN msdb.dbo.cdc_jobs c
ON s.name = 'cdc.' + DB_NAME(c.database_id) + '_' + c.job_type
WHERE DB_NAME(c.database_id) = 'MyDB';

Create the cdc_jobs_view system view in the MSDB database on the geo-secondary which is required by the CDC SQL Agent Jobs:

Code:

USE [msdb]
GO
CREATE VIEW dbo.cdc_jobs_view AS
SELECT
    [database_id],
    [job_type],
    [job_id],
    [maxtrans],
    [maxscans],
    [continuous],
    [pollinginterval],
    [retention],
    [threshold]
FROM dbo.cdc_jobs;

All configurations have been completed to configure CDC for SQL Managed Instance Failover Group. After every geo failover, CDC will resume capturing within the next minute on the new primary replica. The data changes are still logged in the transaction log as usual so there is no data capture loss.

This content includes materials and contributions from Yvonne Vanslageren.

Feedback and suggestions

If you have feedback or suggestions for improving this data migration asset, please contact the Azure Databases SQL Customer Success Engineering Team. Thanks for your support!

Continue reading...

Setting Up CDC for Seamless Operations in Azure SQL Managed Instances with Failover Groups

diegoc

Feedback and suggestions ​

Feedback and suggestions