A
anbugovi
Overview
In this blog, we will cover the Azure solution and deployment approach to migrate very large Oracle databases (200 TB +) to Azure.
VM Solution
Azure Virtual Machine (VM) offers optimal vCPUs for managing Oracle license with high RAM ratio to accommodate large Oracle SGA, IO and network bandwidth to support transaction and batch workload. We tested both M192 and M176 SKUs with a 200 TB+ Oracle database.
In the below comparison, M176 is based on Intel Sapphire Rapids processor with DDR5 offers higher SAPS and 1.5 faster memory access than the M192 (Intel Cascade Lake based processor). M176 is also equipped with Azure Boost technology for improving both IO/Network throughput. In our testing, we found M176 offers higher SAPS, faster memory access, more IO & Network bandwidth.
VM SKU | Intel Chipset | vCPU | Memory GiB | IOPS/MBps | Network Bandwidth (Mbps) |
M192idms_v2 | Cascade Lake / DDR4 | 192 | 4096 | 80000/2000 | 30000 |
M176ds_4_v3 | Sapphire Rapids / DDR5 | 176 | 3892 | 130000/4000 | 40000 |
System Global Area (SGA): Very Large Oracle databases benefit greatly from large SGA size. Customers with such sizeable Oracle workloads should deploy an Azure M-series with a minimum of 4 TB or more RAM size. Specific parameter recommendations below:
- Set Linux Huge Pages to 75-90% of Physical RAM size
- Set System Global Area (SGA) to 90% of Huge Page size
- Set the Oracle parameter USE_LARGE_PAGES = ONLY
Storage Solution and Configuration
Azure has multiple storage options: Premium SSD, Premium SSDv2, Ultra and Azure NetApp Files (ANF). The chart below captures an overview of the storage characteristics for virtual machine Standard_M176ds_4_v3.
IO Metrics | Premium SSD | Premium SSDv2 (Pv2) | Azure NetApp Files (ANF) |
IOPS | 130K | 130K | Millions |
Throughput | 4 GB/s | 4 GB/s | >5 GB/s |
Latency | Lower single digit (in ms) | < 1 ms | < .4 ms |
High Availability | Oracle Data Guard | Oracle Data Guard | Oracle Data Guard |
Disaster Recovery | Oracle Data Guard | Oracle Data Guard | Oracle Data Guard and/or ANF Cross Region Replication |
Storage Snapshot | Yes | No | Yes |
Storage Manager | Automatic Storage Management (ASM) | ASM | dNFS |
For 200 TB+ Oracle database workload, we tested the following storage configuration which optimally leverages both the network and IO channel from ANF and Premium SSDv2 (Pv2) respectively. Leveraging both ANF & Pv2 helped to optimize available VM throughputs effectively to meet and exceed the required IO requirements of such a large Oracle database.
Component | Disk Type | Number of Volumes | Size (TiB) | Total Throughput GiB/s | Volume | Stripe Size |
Oracle Home | Pv2 | 1 | 1 | 250 | LVM | |
sapdata1-6 | ANF | 6 | 40 per volume | 3000-4000 | Individual | |
Oracle redo1-4 | ANF | 4 | .5 per volume | 500-2000 | Individual | |
Oracle Fra | ANF | 1 | 5 | 500-1000 | Individual | |
Oracle Archive | Pv2 | 4 | 10 | 1500 | LVM | 64KB |
Oracle Temp | Pv2 or Ephemeral | 4 | 10 | 1500 | LVM | 64KB |
Storage Deployment Approach
Both NFSv3 and NFSv4.1 are supported with Oracle Direct NFS (dNFS), we ultimately went with the combination of NFSv3 and Oracle Direct NFS. NFSv3 has been proven more reliable, more robust and is much less bug sensitive to dNFS than the newer NFS Version 4.1.
Application volume group for Oracle (AVG for Oracle) deploys all volumes required to install and operate the Oracle databases at enterprise scale, with optimal performance and according to best practices in a single step with optimized workflow. AVG for Oracle shortens Oracle database deployment time and ensures volume performance and stability, including the use of multiple storage endpoints (multiple IPs).
Oracle Database with Azure NetApp Files - Azure Example Scenarios | Microsoft Learn
Understand Azure NetApp Files application volume group for Oracle | Microsoft Learn
The Oracle data files can be distributed across sapdata volumes in round robins to avoid individual filesystem IO pressure.
High Availability Architecture
Azure offers a High Availability option by leveraging availability zones with SLA of 99.99. Most of the Azure regions provide VM SKU and low latency between the zones to deploy active-active HA setup across zones. However not every zone has got the required VM SKU so it is important to find out required VM SKU availability by running SAP-on-Azure-Scripts-and-Utilities/Get-VM-by-Zones at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) from your subscription. You can find out low latency zones by running SAP-on-Azure-Scripts-and-Utilities/AvZone-Latency-Test at main · Azure/SAP-on-Azure-Scripts-and-Utilities (github.com) . Combination of SKU availability and low latency script can guide you to identify zones that can offer active-active zone pair for HA deployment.
It is important to note that each subscription may be mapped to different physical zones. You can find out physical zone mapping using Azure API Subscriptions - List Locations - REST API (Azure Resource Management) | Microsoft Learn.
Below picture provides HA architecture.
Data Protection Strategy
Customers can leverage a combination of ANF snapshot on the primary VM and weekly Oracle streaming backups on HA stand-by. We recommend the ANF snapshot tool provided by Microsoft known as the application consistent snapshot. Both snapshot and cloning can be executed in minutes, regardless of database size. Cloned volumes can be leveraged for system copy, but it is critical that production and QA VMs be on the same physical zone to ensure low latency between them.
Technically, ANF does not prevent you from mounting NFS volumes across zones, so it is important that operational procedure established to keep both zone & ANF storage on same side.
Backup & Snapshot Approach
Domain | Backup Component | Backup Options | Frequency | Ran against | Load on DB VM |
Primary Region | DB | snapshot (azacsnap) | 4 hours | HA Primary VM | Low |
RMAN Backup | Daily incremental and weekly full | HA Stand-by VM | Low | ||
Log | Archive Log Backup | 15 minutes | HA Primary VM | Low | |
DR Region | DB | Oracle Data Guard | Current | n/a | Low |
Database Restore
Failure | Recovery Option | Recovery Time | Comment | |
DB Level | Snapshot | Log (roll-forward) | In Minutes | 1st Option |
RMAN Restore | Log (roll-forward) | In Hours | 2nd Option | |
Region Wide | Oracle Data Guard | In Minutes | 1st Option | |
RMAN Restore | Log (roll-forward) | In Hours | 2nd Option |
Migration Approach
Depending on on-prem HW, OS/DB and SAP software levels, migration falls into either Homogeneous or Heterogeneous migration category.
We will cover a heterogeneous migration approach in a separate blog and discuss about how to reduce downtime and improved benefits for very large databases.
In the homogenous migration approach, smaller databases can be migrated using backup and restore. Larger database can be migrated by setting up Oracle Data Guard (ODG) replication.
Customer should run Azure Quality Check against deployed solution to identify and address any Azure best practices deviation.
Testing Approaches
Customers have leveraged Oracle Real Application Testing (RAT) option to perform real-world testing of the Oracle Database. By capturing production workloads during the peak period and replaying on Azure can help identify the required VM SKU and storage solution. Customer leveraged Azure Monitoring Dashboards and RAT generated outputs to analyze and conclude the test results and move forward confidently to migrate the Oracle on SAP system to Azure.
The RAT test covers Oracle database performance requirements. It is highly recommended to run SAP level volume and performance testing to ensure that end-to-end SAP processing meets and exceeds performance KPIs.
System Performance
Azure innovations such as Mv3 (Intel sapphire rapids /DDR5), Azure Boost for improving IO & Network Throughput, ANF storage solutions with sub-milli second latency with DNFS combined with Oracle advanced compression has resulted in 30-50% of SAP processing improvement on Azure.
Conclusion
Azure has led SAP on Azure solutions over the years and reached new heights every year by bringing over advanced VM SKU, Storage/Network solution, end to end architecture and deployment approaches to successfully deploying the largest Oracle database on SAP to Azure. Azure successfully hosts 200 TB+ SAP on Oracle database!
Useful Links
Below are key SAP Notes and Microsoft documentation for a successful Azure migration
- 2039619 - SAP Applications on Microsoft Azure using the Oracle Database: Supported Products and Versions - SAP for Me
- 1928533 - SAP Applications on Microsoft Azure: Supported Products and Azure VM types - SAP for Me
- Oracle Azure Virtual Machines database deployment for SAP workload | Microsoft Learn
- General performance considerations for Azure NetApp Files | Microsoft Learn
- Understand Azure NetApp Files application volume group for Oracle | Microsoft Learn
- SAP-on-Azure-Scripts-and-Utilities/QualityCheck/Readme.md at main · Azure/SAP-on-Azure-Scripts-and-Utilities · GitHub
- 1672954 - Oracle 11g, 12c, 18c and 19c: Usage of hugepages on Linux - SAP for Me
Co-Authors
Denny Koovakattu
Ralf Klahr
Sathish Thirunethiram
Continue reading...