Minimize the downtime when migrating SQL workloads to Azure SQL Managed Instance

  • Thread starter Thread starter ZoranRilak
  • Start date Start date
Z

ZoranRilak

Database migrations are challenging...




Migrating SQL Server databases and workloads to the cloud is an involved project. It is not all about just moving the data. There is also setting up the identities and access rights, repointing the apps and updating DNS (Domain Name System) servers, routing the traffic, testing latency and failover behavior, standing up the correct security framework, among other things. Put simply—it’s challenging.



Yet, there is a way to significantly de-risk a big part of moving SQL Server to Azure SQL Managed Instance, by doing most of the configuration and verification work upfront and turning the switch-over into a simple update in our domain name system. Just set things up, flip the switch, and watch your apps talk to the SQL Server in the cloud. This post will take you through the steps to do that.



…let us simplify things




We will assume that you have a data replication mechanism already in place. This can be link feature for Managed Instance, Log Replay Service (LRS), transactional replication, or any third-party solution for near real-time replication of data. Let’s also assume that the replication process has fully caught up the secondary copy with the primary replica. Our goal, then, is to “cut over” with minimal delay, that is, repoint all our applications to the secondary replica with as few retries and lost connections experienced by the clients.



Just the same as with all cross-datacenter migrations, repointing applications to target Azure SQL Managed Instance must be done with the apps’ network environment in mind. We may need to change network routes, firewall configurations, domain names, connection strings, security rules, and more. Ideally, we also want to ensure that migration is transparent to our client applications, meaning that each app continues to talk seamlessly with the primary SQL database before, during, and after the database has moved. Additionally, we must not introduce network changes that undermine the level of security and compliance we have already established.



In this blog post you will learn how to migrate to the cloud while retaining control over the critical points of the migration flow:

  • Set up replication and create a replica in Azure
  • Establish client connectivity in Azure
  • Prepare clients for a cutover to Azure
  • Cut over (complete the migration)



Before we begin, let us introduce one helpful concept we will use along the way: that of a listener endpoint.



What is a listener endpoint





When the location of the database changes, it’s natural to expect that the clients must also be configured to talk to the new location. So, the key to minimizing the impact of the cutover on the individual clients connecting to the database lies in making this change as swift and uniform as possible. Is there a way to change how multiple clients connect, even when clients are distributed across environments and networks? There’s a simple way to achieve this by creating a single, known domain name that dynamically resolves to the database we want. We call this domain name a “listener endpoint”.



To understand the behavior of a listener endpoint, let us first consider how our application might connect to the on-premises SQL Server without a listener endpoint in place:



large?v=v2&px=999.png





Our application connects to on-premises SQL Server by resolving a domain name that we have chosen and assigned to the SQL Server’s host machine for that purpose. Here, we assume that you have full control of the DNS server in which this domain name is defined and can reconfigure how it resolves. It does not matter where the DNS server is hosted—on premises, Azure, or hosted by a third party—so long as the applications use it to resolve server addresses and you can update its configuration.



If we introduce another record and set it to “alias” the original one, we have just created a listener endpoint. DNS systems offer a special type of record for this purpose, called the CNAME (“canonical name”) record type:



large?v=v2&px=999.png



There are multiple ways to set up the listener endpoint, for example:

  • You can reuse the original domain name and just turn it into a CNAME record instead.
  • You can rename the original record and reuse its name for the link listener.
  • You can keep the original record and use a different name for the link listener.

Which way you choose depends on your DNS configuration and what best suits your clients, applications, and network environment.

With this, when we do the cutover, we will simultaneously update the listener endpoint:



large?v=v2&px=999.png





Having a listener endpoint in place lets us set things up in Azure while the on-premises SQL Server is still primary. We can change the app configuration, test the outbound connectivity to Azure, and reconfigure our firewalls. When we are satisfied, we change the listener endpoint to the secondary and reverse, or interrupt, the replication, thus making the migration as close to a single, discrete operation as possible.



At a high level, here is what we will be doing:

  1. Set up continuous replication between our on-premises SQL Server database (as the primary replica) and SQL Managed Instance in Azure (as the secondary replica),
  2. Create a DNS record (listener endpoint) pointing to the current primary replica, our on-premises SQL Server,
  3. Reconfigure our applications to use the listener endpoint instead of targeting our on-premises SQL Server directly,
  4. Test if our apps can connect to the secondary database replica in Azure to proactively detect any connectivity issues before we cut over,
  5. Reverse (“fail over”) or interrupt (“cut over”) the replication so that the database on Azure SQL Managed Instance becomes the primary, and at the same time flip the listener endpoint so that it resolves to Azure SQL Managed Instance,
  6. Verify that our apps still work as intended.

Before we get to it, we will list the limitations of the listener endpoint when used in conjunction with Azure SQL Managed Instance.



Limitations




When implementing a listener endpoint for Azure SQL Managed Instance, please note the following:

  • The hostname of the listener endpoint must be identical to the name of Azure SQL Managed Instance. For example, if the instance is named mysqlmi01.abcd0123.database.windows.net, then the link listener endpoint must be named mysqlmi01.contoso.com.
  • The on-premises SQL Server and the SQL Managed Instance engaged in replication must listen for incoming SQL connections on the same port. For SQL Managed Instance, this is fixed to the default SQL Server port, 1433. If the on-premises SQL Server uses a different port number, then the link listener cannot be used to transparently repoint SQL clients.
  • The listener endpoint cannot be used to refer to a named instance of SQL Server on premises (e.g., SERVER01\INSTANCE01).
  • Applications and SQL clients setting up encrypted connections over the listener endpoint must be configured to trust server certificates, for example by passing the “TrustServerCertificate=true” connection parameter. Otherwise, clients will not be able to validate the TLS server certificate on Azure SQL Managed Instance which is issued for a domain different from that of the listener endpoint.
  • After the failover, a brief period of unavailability should be expected. This duration the value of the time-to-live (TTL) property set on the listener endpoint’s DNS record in its DNS server. For Azure Private DNS, TTL can be as short as one second. Other DNS servers may impose longer minimum TTL times.



Step 1. Create a primary replica DNS record (listener endpoint)




We begin by creating a DNS record for the listener endpoint and making sure that it points to the primary replica, which is our on-premises SQL Server for the time being. We will create a new DNS record in whichever DNS server our applications use when they resolve domain names in the connection string. Typically, apps will be using the DNS server configured by their operating system as part of its network configuration. Make sure you check which DNS server it is and that you have the right to change its settings or talk to your network administrator.



We want to create the following record:

  • Zone: pick a DNS zone you own, e.g., contoso.com
  • Resource record name: this must be the name of your SQL Managed Instance
  • Resource record type: CNAME
  • Resource record TTL: pick something short, e.g., 5-10 seconds if possible
  • Resource record value: Use the full domain name of your on-premises SQL server



Placing this record in the DNS server your apps will use to resolve their connection strings makes it so that <name>.contoso.com becomes an alias for your on-premises SQL server’s domain name. You can verify this by running nslookup <domain-name> (Windows) or host <domain-name> (Linux) to verify that the listener endpoint and your on-premises SQL’s domain name resolve to the same IP address.



It is important for this record’s time-to-live (TTL) to be as short as possible at the time of cutover or failover, because the longer this value is, the longer we will have to wait for the changes in the link listener to propagate to our applications.



The instructions on how to configure DNS resource records depend on the concrete DNS server in question. Here are a few examples:




Step 2. Reconfigure apps to use the link listener




Once you have confirmed that the <name>.contoso.com alias resolves correctly to your on-premises SQL Server, it is time to reconfigure your apps. For this, we need to do two things:

  1. Replace your SQL Server’s domain name with <name>.contoso.com in your application’s connection settings or connection strings.
  2. If using encrypted connections, enable the “Trust server certificate” configuration option in your apps’ connection settings. If you cannot find it but have access to their connection strings, you can try “trustServerCertificate=true” which is how this parameter is passed to most SQL drivers; for example: "Server=tcp:listener.contoso.com,1433;Persist Security Info=False;User ID={your_username};Password={your_password}; Encrypt=True;TrustServerCertificate=True;Connection Timeout=30;"



Restart your apps or reestablish their connections and verify that they still work as expected.



Step 3. Test to Azure SQL Managed Instance




We are getting close to what we have been building to, which is making Azure SQL Managed Instance the new primary. Before we continue, though, let us double-check whether our apps can establish connections to SQL Server in the cloud. In general, it is a good idea to try connecting to the SQL Managed Instance while the replication is running. Not only will you be verifying that the data is being replicated, you will also confirm that a reliable network path exists from the machines running your applications to your migration destination.



Use an application such as SQL Server Management Studio on one of the machines running your applications (or sharing the same network environment and configuration at least) to connect to the hostname of the SQL Managed Instance. Remember to doublecheck if the replication mechanism you’re running allows you to do so while the replication is active (LRS for example doesn’t).



If you can connect to SQL Managed Instance and can query the data from all client locations and networks, then you are good to go.



Step 4. Make the database replica in Azure




By now we have done the preliminaries—we have a cloud copy of our on-premises database—and have established that our apps:

  1. Use the listener endpoint to connect to our primary, and
  2. Can connect to the managed instance when it becomes the new primary.



large?v=v2&px=999.png



It is time to repoint our apps to the managed instance. In the above diagram, we want to sever the replication and simultaneously flip the link listener’s arrow pointing to the current primary replica so it points to the secondary. We suggest that you read ahead through the steps below to understand what we will be doing, as these steps are best performed in quick succession to minimize the downtime.



Step 4a. Ensure that the secondary replica (cloud) has fully caught up




This will depend on your selected replication mechanism. Refer to its documentation and make sure that the secondary is fully synced with the primary.



Step 4b. Update the listener endpoint




Update the primary replica DNS record we have created in step 2 as follows:

  • Zone: leave unchanged
  • Resource record name: leave unchanged
  • Resource record type: leave unchanged
  • Resource record TTL: leave unchanged
  • Resource record value: Full domain name of SQL Managed Instance, for example mysqlmi01.abcd0123.database.windows.net.



This will instruct applications trying to resolve the listener endpoint to target the cloud replica as the new primary. The change will not come into effect immediately, because DNS records are typically cached along the way from the DNS server in which they are defined to the host machines querying them. We have on average half the TTL of the DNS record before apps get wise to the change. Now, let us make the Azure SQL Managed Instance our new primary replica.



Step 4c. Reverse or interrupt the replication




Depending on your replication mechanism, you will either be able to reverse its direction, or else pause or detach the replication mechanism altogether. As you do so, you may see some application connections failing in the period between the moment you updated the DNS record through the duration of the DNS record’s TTL. This will be caused by the applications still using the cached DNS information from the listener endpoint pointing them to the on-premises replica.



After performing all three steps, our system should look like this:



large?v=v2&px=999.png





Step 5. Verify that everything works




At this point all our applications should be targeting the cloud copy, which is our new primary. Verify that everything works, and if it does not, you can point our applications back to the on-premises SQL Server by performing the last two steps in reverse order:

  • Change the DNS record’s value to the full domain name of your on-premises SQL server.
  • Reverse the direction of the replication, resume, or re-establish it.



Wait TTL seconds again for the changes to propagate, then revisit whatever might have gone wrong, then try again from the top of step 4.



Conclusion




The above article shows you how to set up a listener endpoint to automate application failovers from SQL Server on-premises to Azure SQL Managed Instance. The above examples were extensively tested with Azure’s private DNS zones and Managed Instance’s link feature. If you are also using the link feature, you may consider automating these steps with some clever use of PowerShell for a truly automated failover experience.



We would love to hear back from you. How well did it go? Did you run into any unexpected issues? Please let us know in the comments below!



In the meantime, here are some more resources on the topics discussed:


Continue reading...
 
Back
Top