Add a new Partition to a running CycleCloud SLURM cluster

Posted August 4, 2024Aug 4

[HEADING=1]Overview[/HEADING]

Azure CycleCloud (CC) is a user-friendly platform that orchestrates High-Performance Computing (HPC) environments on Azure, enabling admins to set up infrastructure, job schedulers, filesystems and scale resources efficiently at any size. It's designed for HPC administrators intent on deploying environments with specific schedulers.

SLURM, a widely-used HPC job scheduler, is notable for its open-source, scalable, fault-tolerant design, suitable for Linux clusters of any scale. SLURM manages user resources, workloads, accounting, monitoring, and supports parallel/distributed computing, organizing compute nodes into partitions.

This blog will specifically explain how to integrate a new partition into an active SLURM cluster within CycleCloud, without the need to terminate or restart the entire cluster.

[ATTACH type=full" alt="Jerrance_0-1722712333744.png]62215[/ATTACH]

[HEADING=1]Requirements/Versions:[/HEADING]

CycleCloud Server (CC version used is 8.6.2)
Cyclecloud cli initialized on the CycleCloud VM
A Running Slurm Cluster
- CycleCloud project used is 3.0.7
- Slurm version used is 23.11.7-1
[*]SSH and HTTPS access to CycleCloud VM

[HEADING=1]High Level Overview[/HEADING]

Git clone the CC SLURM repo (not required if you already have a slurm template file)
Edit the Slurm template to add a new partition
Export parameters from the running SLURM cluster
Import the updated template file to the running cluster
Activate the new nodearray(s)
Update the cluster settings (VM size, core count, Image, etc)
Scale the cluster to create the nodes

[HEADING=2] [/HEADING]

[HEADING=2]Step 1: Git clone the CC SLURM repo[/HEADING]

SSH into the CC VM and run the following commands:

sudo yum install -y git

git clone GitHub - Azure/cyclecloud-slurm: Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.

cd cyclecloud-slurm/templates

[ATTACH type=full" alt="Jerrance_1-1722712333750.png]62216[/ATTACH]

[HEADING=2] [/HEADING]

[HEADING=2]Step 2: Edit the SLURM template to add new partition(s)[/HEADING]

Use your editor of choice (ie. vi, vim, Nano, VSCode remote, etc) to edit the “slurm.txt” template file:

cp slurm.txt slurm-part.txt

vim slurm-part.txt

The template file nodearray is the CC configuration unit that associates to a SLURM partition. There are 3 nodearrays defined in the default template:

hpc

: tightly coupled MPI workloads with Infiniband (slurm.hpc = true)

htc

: massively parallel throughput jobs w/o Infiniband (slurm.hpc = false)

dynamic

: enables multiple VM types in the same partition

Choose the nodearray type for the new partition (hpc or htc) and duplicate the [[[nodearray …]]] config section. For example, to create a new nodearray named “GPU” based on the hpc nodearray (NOTE: hpc nodearray configs included for reference):

[[nodearray hpc]]

Extends = nodearraybase

MachineType = $HPCMachineType

ImageName = $HPCImageName

MaxCoreCount = $MaxHPCExecuteCoreCount

Azure.MaxScalesetSize = $HPCMaxScalesetSize

AdditionalClusterInitSpecs = $HPCClusterInitSpecs

EnableNodeHealthChecks = $EnableNodeHealthChecks

[[[configuration]]]

slurm.default_partition = true

slurm.hpc = true

slurm.partition = hpc

[[nodearray GPU]]

Extends = nodearraybase

MachineType = $GPUMachineType

ImageName = $GPUImageName

MaxCoreCount = $MaxGPUExecuteCoreCount

Azure.MaxScalesetSize = $HPCMaxScalesetSize

AdditionalClusterInitSpecs = $GPUClusterInitSpecs

EnableNodeHealthChecks = $EnableNodeHealthChecks

[[[configuration]]]

slurm.default_partition = false

slurm.hpc = true

slurm.partition = gpu

slurm.use_pcpu = false

NOTE: there can only be 1 “slurm.default_partition” and by default it is the HPC nodearray. Set the new one to false, or if you set it to true then change the HPC nodearray to false.

The “variables” in the nodearray config (ie. $GPUMachineType) are referred to as “Parameters” in CC. The Parameters are attributes exposed in the CC GUI to enable per cluster customization. Further down in the template file begins the Parameters configuration beginning with [parameters About] section. We need to add several configuration blocks throughout this section to correspond to the Parameters defined in the nodearray (ie. $GPUMachineType).

Add the GPUMachineType from HPCMachineType:

[[[parameter HPCMachineType]]]

Label = HPC VM Type

Description = The VM type for HPC execute nodes

ParameterType = Cloud.MachineType

DefaultValue = Standard_F2s_v2

[[[parameter GPUMachineType]]]

Label = GPU VM Type

Description = The VM type for GPU execute nodes

ParameterType = Cloud.MachineType

DefaultValue = Standard_F2s_v2

Add the GPUExecuteCoreCount from HPCExecuteCoreCount:

[[[parameter MaxHPCExecuteCoreCount]]]

Label = Max HPC Cores

Description = The total number of HPC execute cores to start

DefaultValue = 100

Config.Plugin = pico.form.NumberTextBox

Config.MinValue = 1

Config.IntegerOnly = true

[[[parameter MaxGPUExecuteCoreCount]]]

Label = Max GPU Cores

Description = The total number of GPU execute cores to start

DefaultValue = 100

Config.Plugin = pico.form.NumberTextBox

Config.MinValue = 1

Config.IntegerOnly = true

Add the GPUImageName from HPCImageName:

[[[parameter HPCImageName]]]

Label = HPC OS

ParameterType = Cloud.Image

Config.OS = linux

DefaultValue = almalinux8

Config.Filter := Package in {"cycle.image.centos7", "cycle.image.ubuntu20", "cycle.image.ubuntu22", "cycle.image.sles15-hpc", "almalinux8"}

[[[parameter GPUImageName]]]

Label = GPU OS

ParameterType = Cloud.Image

Config.OS = linux

DefaultValue = almalinux8

Config.Filter := Package in {"cycle.image.centos7", "cycle.image.ubuntu20", "cycle.image.ubuntu22", "cycle.image.sles15-hpc", "almalinux8"}

Add the GPUClusterInitSpecs from HPCClusterInitSpecs:

[[[parameter HPCClusterInitSpecs]]]

Label = HPC Cluster-Init

DefaultValue = =undefined

Description = Cluster init specs to apply to HPC execute nodes

ParameterType = Cloud.ClusterInitSpecs

[[[parameter GPUClusterInitSpecs]]]

Label = GPU Cluster-Init

DefaultValue = =undefined

Description = Cluster init specs to apply to GPU execute nodes

ParameterType = Cloud.ClusterInitSpecs

NOTE: Keep in mind that you can customize the "DefaultValue" for parameters as per your requirements, or alternatively, you can make changes directly within the CycleCloud graphical user interface.

Save the template file and exit (ie. :wq for vi/vim).

[HEADING=2]Step 3: Export parameters from the running SLURM cluster[/HEADING]

You now have an updated SLURM template file to add a new GPU partition. The template will need to be “imported” into CycleCloud to overwrite the existing cluster definition. Before doing that, however, we need to export all the current cluster GUI parameter configs from the cluster into a local json file to use in the import process. Without this json file the cluster configs are all reset to the default values specified in the template file (and overwriting any customizations applied to the cluster in the GUI).

From the CycleCloud VM run the following command format:

cyclecloud export_parameters cluster_name > file_name.json

For my cluster the specific command is:

cyclecloud export_parameters jm-slurm-test > jm-slurm-test-params.json

cat jm-slurm-test-params.json

{

"UsePublicNetwork" : false,

"configuration_slurm_accounting_storageloc" : null,

"AdditionalNFSMountOptions" : null,

"About shared" : null,

"NFSSchedAddress" : null,

"loginMachineType" : "Standard_D8as_v4",

"DynamicUseLowPrio" : false,

"configuration_slurm_accounting_password" : null,

"Region" : "southcentralus",

"MaxHPCExecuteCoreCount" : 240,

"NumberLoginNodes" : 0,

"HTCImageName" : "cycle.image.ubuntu22",

"MaxHTCExecuteCoreCount" : 10,

"AdditionalNFSExportPath" : "/data",

"DynamicClusterInitSpecs" : null,

"About shared part 2" : null,

"HPCImageName" : "cycle.image.ubuntu22",

"SchedulerClusterInitSpecs" : null,

"SchedulerMachineType" : "Standard_D4as_v4",

"NFSSchedDiskWarning" : null,

…<truncated>

}

If the cyclecloud command does not work you may need to initialize the cli tool as described in the docs: Install the Command Line Interface - Azure CycleCloud

[HEADING=2]Step 4: Import the updated template file to the running cluster[/HEADING]

To import the updated template to the running cluster in CycleCloud run the following command format:

cyclecloud import_cluster <cluster_name> -c Slurm -f <template file name> txt -p <parameter file name> --force

For my cluster the specific command is:

cyclecloud import_cluster jm-slurm-test -c Slurm -f slurm-part.txt -p jm-slurm-test-params.json --force

[ATTACH type=full" alt="Jerrance_2-1722712333752.png]62217[/ATTACH]

In the CycleCloud GUI we can now see the “gpu” nodearray has been added. Click on the “Arrays” tab in the middle panel as shown in the following screen capture:

[ATTACH type=full" alt="Jerrance_3-1722712333755.png]62218[/ATTACH]

The gpu nodearray is added to the cluster but it is not yet “Activated,” which means it is not yet available for use.

[HEADING=2]Step 5: Activate the new nodearray(s)[/HEADING]

The cyclecloud start_cluster command will now kickstart the new nodearray activation using the following format:

cyclecloud start_cluster <cluster_name>

For my cluster the command is:

cyclecloud start_cluster jm-slurm-test

[ATTACH type=full" alt="Jerrance_4-1722712333756.png]62219[/ATTACH]

From the CycleCloud GUI we will see the gpu nodearray status will move to “Activation” and finally “Activated:”

[ATTACH type=full" alt="Jerrance_5-1722712333759.png]62220[/ATTACH]

[HEADING=2] [/HEADING]

[HEADING=2]Step 6: Update the cluster settings[/HEADING]

Edit the cluster settings in the CycleCloud GUI to pick the “GPU VM Type” and “Max GPU Cores” in the “Required Settings” section:

[ATTACH type=full" alt="Jerrance_6-1722712333764.png]62221[/ATTACH]

Update the “GPU OS” and “GPU Cluster-Init” as needed in the “Advanced Settings” section:

[ATTACH type=full" alt="Jerrance_7-1722712333769.png]62222[/ATTACH]

[HEADING=2]Step 7: Scale the cluster to create the nodes[/HEADING]

To this point we added the new nodearray to CycleCloud but SLURM does not yet know about the new GPU partition. We can see this from the scheduler VM with the sinfo command:

[ATTACH type=full" alt="Jerrance_8-1722712333770.png]62223[/ATTACH]

The final step is to “scale” the cluster to “pre-define” the compute nodes as needed by SLURM. The CycleCloud azslurm scale command will accomplish this:

[ATTACH type=full" alt="Jerrance_9-1722712333771.png]62224[/ATTACH]

Your cluster is now ready to use the new GPU partition.

[HEADING=2] [/HEADING]

[HEADING=2]SUMMARY[/HEADING]

Adding a new partition to SLURM with Azure CycleCloud is a flexible and efficient way to update your cluster and leverage different types of compute nodes. You can follow the steps outlined in this article to create a new nodearray, configure the cluster settings, and scale the cluster to match the SLURM partition. By using CycleCloud and SLURM, you can optimize your cluster performance and resource utilization.

References:

CycleCloud Documentation

CycleCloud-SLURM Github repository

Microsoft Training for SLURM on Azure CycleCloud

SLURM documentation

Quote