B
BeatrizSilveira
Overview
Azure Site Recovery (ASR) is a service that often comes to mind when adopting a business continuity and disaster recovery (BCDR) approach. In summary, ASR continuously replicates workloads running on physical and virtual machines (VMs) from a primary to a secondary site. If disaster strikes and causes an outage, ASR will fail over workloads to the secondary site and ensure applications remain accessible, and later fail back to the primary site once it becomes available again. A secondary ‘site’ may be another Azure region, or a different availability zone in the same Azure region.
What ‘failing over’ actually means is creating copies of the protected machines in the secondary site, which ideally should retain all the configurations applied to the original VMs in the primary site. However, there are some settings that ASR is unable to copy. Encryption at host is an example of this. Even though ASR can replicate VMs with this feature enabled, the VMs created by the service in case of a failover will not have it enabled. Detailed information on what settings are and are not supported by ASR can be found at Support matrix for Azure VM disaster recovery with Azure Site Recovery.
Encryption at host is an important security feature. It complements Server-Side Encryption (SSE) by encrypting temporary disks, OS and data disk caches at rest. Moreover, it ensures data in transit from the VM to the storage service flows encrypted, therefore achieving end-to-end encryption.
In this scenario, manually getting your failed over VMs to become compliant again could be cumbersome, especially in large deployments, as it would require shutting down the VM, enabling encryption at host, and starting the VM back up again. This is where Azure Automation can add value.
This article shows how to effectively perform post failover tasks using an Azure Automation Account and a PowerShell runbook. While this article focuses on enabling encryption at host in failed over Azure VMs, a similar approach can be used for other post failover actions.
1. Create an Azure Automation Account and PowerShell runbook
As the name implies, Azure Automation is a cloud-based service that automates different kinds of tasks, from OS updates using Update Management, to processes in Azure VMs or machines running on-premises, or in other cloud environments. For this demo, we’re mostly interested in the Process Automation capability of Azure Automation.
Managed Identities
Create an Azure Automation Account following this guidance: Quickstart - Create an Azure Automation account using the portal | Microsoft Learn. For simplicity, the System-assigned option was selected for managed identities in the Advanced tab. The creation of a managed identity is required for this scenario, otherwise the Automation Account (the runbooks in particular), are unable to make any changes on the target machines. User-assigned managed identities are standalone resources that can be reused, whereas system-assigned managed identities have a lifecycle tied to the service for which they were created, the Automation Account in this case.
Networking
In the Networking tab, you’ll see the option to configure the Automation Account with Public or Private Access. Ideally, you should choose Private Access and lock down the Automation Account to only receive requests coming from clients accessing the service through a private endpoint. However, this would break the integration between ASR and Azure Automation. As we’ll see later on, an input parameter from ASR is required to trigger the PowerShell runbook. Today is it not possible for ASR to use a private endpoint connection to access another Azure service. Traffic between Azure services is always encrypted and traverses Microsoft’s backbone network, despite using public endpoints.
For the reason explained above, and to keep the demo simple, the Automation Account was configured with Public Access. If you have a highly secure environment, and you must disable Public Access on your Automation Account, you may want to explore a different approach that doesn’t rely on an input variable from ASR to trigger the runbook. Note that once you create a private endpoint for Azure Automation, you no longer can run cloud jobs. Instead, you must use a Hybrid Runbook Worker hosted in an Azure virtual network. For more information, see Use Azure Private Link to securely connect networks to Azure Automation | Microsoft Learn.
RBAC Permissions
Once the Automation Account is created, you must make sure its managed identity has the necessary RBAC permissions to perform the steps required to enable Encryption at host in a VM. In my experience, the ‘Virtual Machine Contributor’ role is enough for this, and the scope was restricted to the Resource Group where the resources for this demo were deployed. In the real world, you can choose the Resource Group(s) where you’re planning to deploy failed over VMs.
For guidance on how to assign RBAC roles to a managed identity see, Assign Azure roles to a managed identity (Preview) - Azure RBAC | Microsoft Learn.
Create PowerShell Runbook and add PowerShell script
Follow this guidance to create a PowerShell runbook: Tutorial - Create a PowerShell Workflow runbook in Azure Automation | Microsoft Learn.
Once the runbook is created, the portal will automatically redirect to the ‘Edit PowerShell Runbook’ view. Here, you can add the following PowerShell script:
param (
[parameter(Mandatory=$false)]
[Object]$RecoveryPlanContext
)
Disable-AzContextAutosave -Scope Process
$AzureContext = (Connect-AzAccount -Identity).context
Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext
$VMinfo = $RecoveryPlanContext.VmMap | Get-Member | Where-Object MemberType -EQ NoteProperty | select -ExpandProperty Name
$vmMap = $RecoveryPlanContext.VmMap
foreach($VMID in $VMinfo)
{
$VM = $vmMap.$VMID
if( !(($VM -eq $Null) -Or ($VM.ResourceGroupName -eq $Null) -Or ($VM.RoleName -eq $Null))) {
#this check is to ensure that we skip when some data is not available else it will fail
$VMObject = Get-AzVM -ResourceGroupName $VM.ResourceGroupName -Name $VM.RoleName
"Stopping {0} VM in {1} RG. " -f $VM.RoleName,$VM.ResourceGroupName
stop-azvm -ResourceGroupName $VM.ResourceGroupName -Name $VM.RoleName -Force
# Configure encryption
"Configuring encryption for {0} VM in {1} RG. " -f $VM.RoleName,$VM.ResourceGroupName
Update-AzVM -VM $VMObject -ResourceGroupName $VMObject.ResourceGroupName -EncryptionAtHost $true
# Start VM
"Starting {0} VM in {1} RG. " -f $VM.RoleName,$VM.ResourceGroupName
start-azvm -ResourceGroupName $VM.ResourceGroupName -Name $VM.RoleName
}
}
The script above was written based on the information present in the following sources:
Azure PowerShell - Enable end-to-end encryption on your VM host - Azure Virtual Machines
Add Azure Automation runbooks to Site Recovery recovery plans - Azure Site Recovery
I also counted on the help and expertise of my colleague Jose Fehse, also from the FastTrack team. Thanks, Jose!
As you’ll see in the second link pasted above, the parameter $RecoveryPlanContext used in the beginning of the script is generated by ASR and contains a variable (‘$VmMap) with a list of all failed over VMs. What the script does is loop through the list of failed over VMs and, for each one, stop the VM, enable Encryption at host, and start it back up.
Save and Publish the runbook. Using the Test Pane is not very useful at this point because we don’t have the $RecoveryPlanContext yet. If you need to make sure the PowerShell commands work as expected, testing them locally or using CloudShell in the Azure portal is a better option.
2. Create ASR Recovery Plan and add PowerShell runbook as post failover step
Assuming your Recovery Services Vault (RSV) already has protected VMs, the next step is to create a Recovery Plan in ASR. In summary, a Recovery Plan lets you create groups of machines you want to fail over together, specify the order of the groups for initiating failover, and add tasks or instructions during this process. For more information about Recovery Plans, see About recovery plans in Azure Site Recovery - Azure Site Recovery.
Create a Recovery Plan following this guidance: Create/customize recovery plans in Azure Site Recovery - Azure Site Recovery . For the purpose of this demo, a Recovery Plan with a single test VM was created, and ASR was configured to fail over from availability zone 1 to zone 2 in Sweden Central. Following a similar approach is recommended before applying this workflow on production VMs.
Once the plan is created, click on Customize. Next to Group1: Start, click on the 3 dots on the right to add a post action, meaning after failover is complete:
In the next blade, select the Automation Account previously created and the runbook. Click Ok to save.
If you click Ok and nothing seems to happen, try going back to the previous blade (Recovery Plan view) and click Save. After that, the action added should appear as a post-step:
3. Perform test failover
The last step is to initiate a Test Failover to verify that the runbook is working properly.
VM disk settings before failover
For instructions on how to check and change a VM’s disk encryption settings, see Enable end-to-end encryption using encryption at host - Azure portal.
Once the failover is complete, we can check the status of the cloud job, and the disk settings of the failed over VM:
Job output
VM disk settings after failover
Note that when doing a Test failover, ASR adds a ‘-test’ suffix to the original name of the VM.
If the runbook didn’t work as expected, you can copy the $RecoveryPlanContext input generated by ASR and use it as a parameter in the Test pane of the runbook to perform additional tests without having to initiate a new test failover every time.
If everything went smoothly, it is safe to delete the test failover VM by clicking on Cleanup test failover in the Recovery Plans view.
Next steps
Now that we’ve explored how to automate post failover actions in Azure VMs, you may want to explore automating tasks during failover next:
Run a failover during disaster recovery with Azure Site Recovery - Azure Site Recovery
A few tasks you might want to run during an Azure Site Recovery DR
Continue reading...