Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 4

  • Thread starter Thread starter Kruti_Mehta
  • Start date Start date
K

Kruti_Mehta

This article is intended for audience who are considering options to move their data into Azure and prefer T-SQL to query the data.

Overview

This is a follow-up blog to Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 1 , Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 2 & Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 3. In this article we will cover the following following key decision factors



E) Backup and Restore

Only in the Dedicated SQL pool ,data is stored in relational tables with columnar storage. Serverless SQL Pool leverages Dedicated SQL pool if data has to be stored relational tables or Azure SQL. Hence the concept of Backup and Restore like in the traditional SQL Server engine applies only to Dedicated SQL Pool. It provides the option for datawarehouse snapshot. You can leverage this to recover or copy your data warehouse to a previous state.



Snapshots of your data warehouse are taken throughout the day creating restore points that are available for seven days. This retention period cannot be changed. However, the dedicated SQL pool should be in an active state for restore point creation. If it is paused frequently, automatic restore points may not be created so make sure to create user-defined restore point before pausing the dedicated SQL pool.

User-defined restore points are available for seven days and are automatically deleted on your behalf. You cannot change the retention period of user-defined restore points. 42 user-defined restore points are guaranteed at any point in time so they must be deleted before creating another restore point. You can trigger snapshots to create user-defined restore points through PowerShell or the Azure portal. It vital you check this list for backup/restore options. When you drop a dedicated SQL pool, a final snapshot is created and saved for seven days.




A geo-backup is created once per day to a paired data center. The RPO for a geo-restore is 24 hours. A geo-restore is always a data movement operation and the RTO will depend on the data size. You can disable them and save on disaster recovery storage costs. Should you choose to leverage this feature do refer Understanding how Snapshots Accrue Charges. The geo-redundant charge covers the cost for storing the geo-backups.


F) High Availability

Storage Aspects

High Availability can be achieved at the storage layer leveraging any of the underlying methodologies

Replication TypeCopiesReplication Strategy
LRS (Locally Redundant Storage)3Replicates your storage account three times within a single data center in the primary regionlarge?v=v2&px=999.png
ZRS (Zone Redundant Storage)3Replicates your storage account synchronously across three Azure availability zones in the primary regionlarge?v=v2&px=999.png
GRS (Geo Redundant Storage)6
Copies your data synchronously three times within a single physical location in the primary region using LRS.

It then copies your data asynchronously to a single physical location in a secondary region that is hundreds of miles away from the primary region.

But data is not accessible for reading from secondary region
large?v=v2&px=999.png
RA-GRS (Read-access Geo-Redundant Storage)6Same as GRS but your data is always available to be read from the secondary region.large?v=v2&px=999.png
GZRS (Geo-Zone-Redundant Storage)6
Same situation like ZRS.

Added to this is 3 asynchronous copies in secondary region.

But data is not accessible for reading from secondary region
large?v=v2&px=999.png
RA-GZRS (Read-access Geo-Zone-Redundant storage6
Same situation like ZRS.

Added to this is 3 asynchronous copies in secondary region.

But data is accessible for reading from secondary region.
large?v=v2&px=999.png


Replication for data in ADLS Gen2 occurs at the file level. Azure Storage offers two options for copying your data to a secondary region GRS and GZRS. For read access to the secondary region, configure your storage account to use read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). When you create Azure Synapse Analytics Workspaces and should you use the defaults where you let the service create a new ADLS Gen2 account name and file system name it will create a default RA-GRS enabled StorageV2 account. Microsoft recommends RA-GZRS for maximum availability and durability for your applications.

643x573?v=v2.jpglarge?v=v2&px=999.jpg

Based on the primary region you selected, the paired secondary region is determined based on the primary region, and can't be changed. You can check the list of paired data center today.​

large?v=v2&px=999.jpg

Its always good to check this page before deciding on your strategy.

If you decide to change the defaults, ensure you follow the steps mentioned in this table to make modifications.

Storage account failovers shouldn't be used as part of your data migration strategy.

Failover is a temporary solution to a service outage. You can find the options for modifications directly on the newly created storage account [Overview -> Redundancy section].

large?v=v2&px=999.jpglarge?v=v2&px=999.jpg



Compute Aspects

Data Factory and Azure Synapse pipelines store artifacts that include the following data:

MetadataMonitoring Data

  • Pipeline
  • Datasets
  • Linked services
  • Integration runtime
  • Pipeline
  • Triggers
  • Activity runs

Data Factory or Azure Synapse pipelines automatically fail over to the paired region when you Set up automated recovery. If you need to validate your recovered pipelines, you can back up the Azure Resource Manager (ARM) templates for your production pipelines in secret storage, and compare the recovered pipelines to the backups.



To achieve BCDR in the event of an entire region failure, you need a data factory or an Azure Synapse workspace in the secondary region. In case of accidental Data Factory or Azure Synapse pipeline deletion, outages, or internal maintenance events, you can use Git and CI/CD to recover the pipelines manually. Optionally, you can use an active/passive implementation.

The primary region handles normal operations and remains active, while the secondary DR region requires pre-planned steps, depending on specific implementation, to be promoted to primary. In this case, all the necessary configurations for infrastructure are available in the secondary region, but they aren't provisioned.


BCDR with CI/CD

BCDR strategies include availability zone redundancy, automated recovery provided by Azure disaster recovery, and user-managed recovery by using continuous integration/continuous delivery (CI/CD).

495x393?v=v2.jpg


  1. Data Factory and Azure Synapse pipelines achieve resiliency by using Azure regions and Azure availability zones.
    • Each Azure region has a set of datacenters that are deployed within a latency-defined perimeter.
    • Azure availability zones are physically separate locations within each Azure region that are tolerant to local failures.
    • All Azure regions and availability zones are connected through a dedicated, regional low-latency network and by a high-performance network.
    • All availability zone-enabled regions have at least three separate availability zones to ensure resiliency.

  2. When a datacenter, part of a datacenter, or an availability zone in a region goes down, failover happens with zero downtime for zone-resilient Data Factory and Azure Synapse pipelines.

By using CI/CD, the Data Factory and Azure Synapse pipelines can integrate to a Git repository and deploy to a secondary region for immediate recovery.

Disable triggers
Disable triggers in the original primary data factory once it comes back online.
You can disable the triggers manually, or implement automation to periodically check the availability of the original primary.
Disable all triggers on the original primary data factory immediately after the factory recovers.

Thus there are in-built and manual options available for both storage and compute aspects for Azure Synapse SQL

We will cover the final aspect of Performance Tuning in both these models in the last article for this series Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 5



Good Reads:


References:

  1. Backup and Restore in Dedicated Pool
  2. BCDR in Synapse Analytics Pipeline
  3. Available Features in ADF v/s Synapse
  4. Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool
  5. Cheat sheet for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics

Don't forget to share a small?v=v2&px=200.jpg if this helps

Credit:
Thanks Ben Harding for review and guidance

FastTrack for Azure: Move to Azure efficiently with customized guidance from Azure engineering.

Continue reading...
 
Back
Top