Posted September 4, 20231 yr Azure Managed Lustre delivers the time-tested Lustre file system as a first-party managed service on Azure. Long-time users of Lustre on-premises can now leverage the benefits of a complete HPC solution, including compute and high-performance storage, delivered on Azure. There is a known behaviour in Lustre if a VM has the Lustre mounted and it gets evicted or deleted as part of workflow without releasing the filesystem lock. Lustre will keep the lock for next 10 – 15 minutes before it releases. Lustre has a ~10-minute timeout period to release the LOCK. The other VMs (Lustre clients) using the same Lustre mount point might experience intermittent hung mounts for 10-15 mins. This blog discusses, how we can use Azure Schedule Events to unmount Azure Managed Lustre cleanly in a VMSS or a SPOT VM to avoid the similar issue explained above. Scale set instances can opt-in to receive instance termination notifications and set a pre-defined delay timeout to the Terminate operation. The termination notification is sent through Azure Metadata Service – Scheduled Events, which provides notifications for and delaying of impactful operations such as reboots and redeploy. Refer Terminate notification for Azure Virtual Machine Scale Set instances for more information. With Azure Schedule Events, your application can discover when maintenance will occur and trigger tasks to limit its impact. Many applications can benefit from time to prepare for VM maintenance. The time can be used to perform application-specific tasks that improve availability, reliability, and serviceability, including: Checkpoint and restore. Connection draining. Primary replica failover. Removal from a load balancer pool. Event logging. Graceful shutdown. The scheduled event can be checked using the following command. curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq The following Scheduled events are supported. Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files. Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost). This event is made available on a best effort basis Terminate: The virtual machine is scheduled to be deleted. Here is the sample output from the Schedule Events: [root@almavmssn000000 ~]# curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 706 100 706 0 0 114k 0 --:--:-- --:--:-- --:--:-- 114k { "DocumentIncarnation": 4, "Events": [ { "EventId": "1F438EDF-C2E4-4291-80F5-642637520764", "EventStatus": "Scheduled", "EventType": "Reboot", "ResourceType": "VirtualMachine", "Resources": [ "almavmss_2" ], "NotBefore": "Fri, 01 Sep 2023 10:26:13 GMT", "Description": "Virtual machine is going to be restarted as requested by authorized user.", "EventSource": "User", "DurationInSeconds": -1 } ] } [vinil@almavmssn000004 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 305 100 305 0 0 50833 0 --:--:-- --:--:-- --:--:-- 50833 { "DocumentIncarnation": 1, "Events": [ { "EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E", "EventStatus": "Scheduled", "EventType": "Terminate", "ResourceType": "VirtualMachine", "Resources": [ "almavmss_4" ], "NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT", "Description": "", "EventSource": "Platform", "DurationInSeconds": -1 } ] } [vinil@almavmssn000004 ~]$ Here is the script to unmount the Lustre filesystem using Scheduled Events. You could modify the script to suit your requirements. This is for demonstration purposes. The following script was added as a cron job to monitor the event. This script will work on most of the Linux distributions. NOTE: Update the MOUNTPOINT variable according to your environment. #!/bin/bash #Author - Vinil Vadakepurakkal, Microsoft #Date - 24/08/2023 #Using Azure Scheduled Events to unmount Lustre filesystems #This script is intended to be run as a cron job on the Lustre client nodes #The script will check for scheduled events and if it finds one, it will unmount the Lustre filesystem #Event Types are below: #Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files. #Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). This event is made available on a best effort basis #Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). This event is delivered on a best effort basis. #Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost). #Terminate: The virtual machine is scheduled to be deleted. MOUNTPOINT=/pfsvinilv #This script is intended to be run as a cron job on the Lustre client nodes NO_OF_EVENTS=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq | grep EventId | wc -l) NO_OF_EVENTS=`expr $NO_OF_EVENTS - 1` for i in `seq 0 $NO_OF_EVENTS` do RESOURCE_NAME=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].Resources[0]) EVENT_TYPE=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].EventType) INSTANCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.name) OS_HOSTNAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.osProfile.computerName| tr -d '"') HOSTNAME=$(hostname) echo $INSTANCE_NAME echo $RESOURCE_NAME if [ $RESOURCE_NAME = $INSTANCE_NAME ] then echo "$OS_HOSTNAME has a scheduled event of type $EVENT_TYPE" | logger echo "unmounting Lustre filesystem $MOUNTPOINT from $HOSTNAME" | logger /usr/bin/fuser -ku $MOUNTPOINT /usr/bin/sleep 5 /usr/bin/umount -l $MOUNTPOINT echo "Lustre filesystem unmounted from $HOSTNAME" | logger fi done Testing the functionality. In my setup Lustre is mounted on /pfsvinilv mountpoint. [root@almavmssn000001 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 16421280 0 16421280 0% /dev tmpfs 16458648 0 16458648 0% /dev/shm tmpfs 16458648 66080 16392568 1% /run tmpfs 16458648 0 16458648 0% /sys/fs/cgroup /dev/sda2 30416376 22340904 8075472 74% / /dev/sda1 506528 254660 251868 51% /boot /dev/sda15 506600 5952 500648 2% /boot/efi 10.222.1.17@tcp:/lustrefs 17010128952 1264 16151959984 1% /pfsvinilv tmpfs 3291728 0 3291728 0% /run/user/1000 Invoked a terminate event on VM from the Azure Portal. It created a terminate event using Schedule Events. [vinil@almavmssn000001 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 305 100 305 0 0 50833 0 --:--:-- --:--:-- --:--:-- 50833 { "DocumentIncarnation": 1, "Events": [ { "EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E", "EventStatus": "Scheduled", "EventType": "Terminate", "ResourceType": "VirtualMachine", "Resources": [ "almavmss_1" ], "NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT", "Description": "", "EventSource": "Platform", "DurationInSeconds": -1 } ] } After a couple of minutes, the script in the cron job umounted lustre filesystem. This will avoid the intermittent filesystem hanging in the Lustre client. the following output shows that it unmounted the lustre mountpoint before the VM was terminated. [root@almavmssn000001 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 16421280 0 16421280 0% /dev tmpfs 16458648 0 16458648 0% /dev/shm tmpfs 16458648 66084 16392564 1% /run tmpfs 16458648 0 16458648 0% /sys/fs/cgroup /dev/sda2 30416376 22340904 8075472 74% / /dev/sda1 506528 254660 251868 51% /boot /dev/sda15 506600 5952 500648 2% /boot/efi tmpfs 3291728 0 3291728 0% /run/user/1000 tmpfs 3291728 0 3291728 0% /run/user/0 [root@almavmssn000001 ~]# This script will send some events in the syslog about the event. [root@almavmssn000001 ~]# grep Reboot /var/log/messages Sep 1 10:15:01 almavmssn000001 root[64930]: almavmssn000001 has a scheduled event of type "Reboot" Sep 1 10:16:01 almavmssn000001 root[65056]: almavmssn000001 has a scheduled event of type "Reboot" [root@almavmssn000001 ~]# [root@almavmssn000001 ~]# grep pfsvinilv /var/log/messages Sep 1 10:15:01 almavmssn000001 root[64932]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001 Sep 1 10:15:06 almavmssn000001 systemd[1]: pfsvinilv.mount: Succeeded. Sep 1 10:16:01 almavmssn000001 root[65058]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001 [root@almavmssn000001 ~]# References: Azure Managed Lustre File System documentation Terminate notification for Azure Virtual Machine Scale Set instances Azure Metadata Service: Scheduled Events for Linux VMs Continue reading...
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.