K
Kruti_Mehta
This article is intended for audience who are considering options to move their data into Azure and prefer T-SQL to query the data.
Overview
This is a follow-up blog to Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 1 , Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 2 & Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 3. In this article we will cover the following following key decision factors
E) Backup and Restore
Only in the Dedicated SQL pool ,data is stored in relational tables with columnar storage. Serverless SQL Pool leverages Dedicated SQL pool if data has to be stored relational tables or Azure SQL. Hence the concept of Backup and Restore like in the traditional SQL Server engine applies only to Dedicated SQL Pool. It provides the option for datawarehouse snapshot. You can leverage this to recover or copy your data warehouse to a previous state.
Snapshots of your data warehouse are taken throughout the day creating restore points that are available for seven days. This retention period cannot be changed. However, the dedicated SQL pool should be in an active state for restore point creation. If it is paused frequently, automatic restore points may not be created so make sure to create user-defined restore point before pausing the dedicated SQL pool.
User-defined restore points are available for seven days and are automatically deleted on your behalf. You cannot change the retention period of user-defined restore points. 42 user-defined restore points are guaranteed at any point in time so they must be deleted before creating another restore point. You can trigger snapshots to create user-defined restore points through PowerShell or the Azure portal. It vital you check this list for backup/restore options. When you drop a dedicated SQL pool, a final snapshot is created and saved for seven days.
A geo-backup is created once per day to a paired data center. The RPO for a geo-restore is 24 hours. A geo-restore is always a data movement operation and the RTO will depend on the data size. You can disable them and save on disaster recovery storage costs. Should you choose to leverage this feature do refer Understanding how Snapshots Accrue Charges. The geo-redundant charge covers the cost for storing the geo-backups.
F) High Availability
Storage Aspects
High Availability can be achieved at the storage layer leveraging any of the underlying methodologies
Replication for data in ADLS Gen2 occurs at the file level. Azure Storage offers two options for copying your data to a secondary region GRS and GZRS. For read access to the secondary region, configure your storage account to use read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). When you create Azure Synapse Analytics Workspaces and should you use the defaults where you let the service create a new ADLS Gen2 account name and file system name it will create a default RA-GRS enabled StorageV2 account. Microsoft recommends RA-GZRS for maximum availability and durability for your applications.
Its always good to check this page before deciding on your strategy.
If you decide to change the defaults, ensure you follow the steps mentioned in this table to make modifications.
Storage account failovers shouldn't be used as part of your data migration strategy.
Failover is a temporary solution to a service outage. You can find the options for modifications directly on the newly created storage account [Overview -> Redundancy section].
Compute Aspects
Data Factory and Azure Synapse pipelines store artifacts that include the following data:
Data Factory or Azure Synapse pipelines automatically fail over to the paired region when you Set up automated recovery. If you need to validate your recovered pipelines, you can back up the Azure Resource Manager (ARM) templates for your production pipelines in secret storage, and compare the recovered pipelines to the backups.
To achieve BCDR in the event of an entire region failure, you need a data factory or an Azure Synapse workspace in the secondary region. In case of accidental Data Factory or Azure Synapse pipeline deletion, outages, or internal maintenance events, you can use Git and CI/CD to recover the pipelines manually. Optionally, you can use an active/passive implementation.
The primary region handles normal operations and remains active, while the secondary DR region requires pre-planned steps, depending on specific implementation, to be promoted to primary. In this case, all the necessary configurations for infrastructure are available in the secondary region, but they aren't provisioned.
BCDR with CI/CD
BCDR strategies include availability zone redundancy, automated recovery provided by Azure disaster recovery, and user-managed recovery by using continuous integration/continuous delivery (CI/CD).
By using CI/CD, the Data Factory and Azure Synapse pipelines can integrate to a Git repository and deploy to a secondary region for immediate recovery.
Disable triggers
Disable triggers in the original primary data factory once it comes back online.
You can disable the triggers manually, or implement automation to periodically check the availability of the original primary.
Disable all triggers on the original primary data factory immediately after the factory recovers.
Thus there are in-built and manual options available for both storage and compute aspects for Azure Synapse SQL
We will cover the final aspect of Performance Tuning in both these models in the last article for this series Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 5
Good Reads:
References:
Don't forget to share a if this helps
Credit: Thanks Ben Harding for review and guidance
FastTrack for Azure: Move to Azure efficiently with customized guidance from Azure engineering.
Continue reading...
Overview
This is a follow-up blog to Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 1 , Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 2 & Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 3. In this article we will cover the following following key decision factors
E) Backup and Restore
Only in the Dedicated SQL pool ,data is stored in relational tables with columnar storage. Serverless SQL Pool leverages Dedicated SQL pool if data has to be stored relational tables or Azure SQL. Hence the concept of Backup and Restore like in the traditional SQL Server engine applies only to Dedicated SQL Pool. It provides the option for datawarehouse snapshot. You can leverage this to recover or copy your data warehouse to a previous state.
Snapshots of your data warehouse are taken throughout the day creating restore points that are available for seven days. This retention period cannot be changed. However, the dedicated SQL pool should be in an active state for restore point creation. If it is paused frequently, automatic restore points may not be created so make sure to create user-defined restore point before pausing the dedicated SQL pool.
User-defined restore points are available for seven days and are automatically deleted on your behalf. You cannot change the retention period of user-defined restore points. 42 user-defined restore points are guaranteed at any point in time so they must be deleted before creating another restore point. You can trigger snapshots to create user-defined restore points through PowerShell or the Azure portal. It vital you check this list for backup/restore options. When you drop a dedicated SQL pool, a final snapshot is created and saved for seven days.
A geo-backup is created once per day to a paired data center. The RPO for a geo-restore is 24 hours. A geo-restore is always a data movement operation and the RTO will depend on the data size. You can disable them and save on disaster recovery storage costs. Should you choose to leverage this feature do refer Understanding how Snapshots Accrue Charges. The geo-redundant charge covers the cost for storing the geo-backups.
- To restore a data warehouse, see Restore a dedicated SQL pool (formerly SQL DW).
- To restore a deleted data warehouse, see Restore a deleted database (formerly SQL DW),
- or if the entire server was deleted, see Restore a data warehouse from a deleted server (formerly SQL DW).
F) High Availability
Storage Aspects
High Availability can be achieved at the storage layer leveraging any of the underlying methodologies
Replication for data in ADLS Gen2 occurs at the file level. Azure Storage offers two options for copying your data to a secondary region GRS and GZRS. For read access to the secondary region, configure your storage account to use read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). When you create Azure Synapse Analytics Workspaces and should you use the defaults where you let the service create a new ADLS Gen2 account name and file system name it will create a default RA-GRS enabled StorageV2 account. Microsoft recommends RA-GZRS for maximum availability and durability for your applications.
Based on the primary region you selected, the paired secondary region is determined based on the primary region, and can't be changed. You can check the list of paired data center today.
Its always good to check this page before deciding on your strategy.
If you decide to change the defaults, ensure you follow the steps mentioned in this table to make modifications.
Storage account failovers shouldn't be used as part of your data migration strategy.
Failover is a temporary solution to a service outage. You can find the options for modifications directly on the newly created storage account [Overview -> Redundancy section].
Compute Aspects
Data Factory and Azure Synapse pipelines store artifacts that include the following data:
Metadata | Monitoring Data |
|
|
Data Factory or Azure Synapse pipelines automatically fail over to the paired region when you Set up automated recovery. If you need to validate your recovered pipelines, you can back up the Azure Resource Manager (ARM) templates for your production pipelines in secret storage, and compare the recovered pipelines to the backups.
To achieve BCDR in the event of an entire region failure, you need a data factory or an Azure Synapse workspace in the secondary region. In case of accidental Data Factory or Azure Synapse pipeline deletion, outages, or internal maintenance events, you can use Git and CI/CD to recover the pipelines manually. Optionally, you can use an active/passive implementation.
The primary region handles normal operations and remains active, while the secondary DR region requires pre-planned steps, depending on specific implementation, to be promoted to primary. In this case, all the necessary configurations for infrastructure are available in the secondary region, but they aren't provisioned.
BCDR with CI/CD
BCDR strategies include availability zone redundancy, automated recovery provided by Azure disaster recovery, and user-managed recovery by using continuous integration/continuous delivery (CI/CD).
Data Factory and Azure Synapse pipelines achieve resiliency by using Azure regions and Azure availability zones.
- Each Azure region has a set of datacenters that are deployed within a latency-defined perimeter.
- Azure availability zones are physically separate locations within each Azure region that are tolerant to local failures.
- All Azure regions and availability zones are connected through a dedicated, regional low-latency network and by a high-performance network.
- All availability zone-enabled regions have at least three separate availability zones to ensure resiliency.
When a datacenter, part of a datacenter, or an availability zone in a region goes down, failover happens with zero downtime for zone-resilient Data Factory and Azure Synapse pipelines.
By using CI/CD, the Data Factory and Azure Synapse pipelines can integrate to a Git repository and deploy to a secondary region for immediate recovery.
Disable triggers
Disable triggers in the original primary data factory once it comes back online.
You can disable the triggers manually, or implement automation to periodically check the availability of the original primary.
Disable all triggers on the original primary data factory immediately after the factory recovers.
Thus there are in-built and manual options available for both storage and compute aspects for Azure Synapse SQL
We will cover the final aspect of Performance Tuning in both these models in the last article for this series Azure SQL v/s Azure Synapse SQL (Dedicated Pool, Serverless Pool) - Part 5
Good Reads:
- Source Control in Synapse
- Best Practices in CI/CD
- CI CD in Azure Synapse Analytics Part 1
- CI CD in Azure Synapse Analytics Part 2
- CI CD in Azure Synapse Analytics Part 3
- CI CD In Azure Synapse Analytics Part 4
- CI CD in Azure Synapse Analytics Part 5
- Modern Data Warehouse DataOps
- Hitchhiker's Guide to Data Lake
References:
- Backup and Restore in Dedicated Pool
- BCDR in Synapse Analytics Pipeline
- Available Features in ADF v/s Synapse
- Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool
- Cheat sheet for dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics
Don't forget to share a if this helps
Credit: Thanks Ben Harding for review and guidance
FastTrack for Azure: Move to Azure efficiently with customized guidance from Azure engineering.
Continue reading...