Jump to content

Deploy Linux workstations for 3D visualization in Azure


Recommended Posts

Guest jangelfdez
Posted

If you need a remote workstation with graphical acceleration for 3D visualization, Azure has several options available. From the original NV series, with the NVIDIA Tesla M60, to the fifth generation of the family with the NVads A10 v5 series based on the NVIDIA A10 cards. This series is the first one to introduce support for the use of partitioned NVIDIA GPUs with a minimum of 1/6 of the GPU resources in the smaller version with the Standard_NV6ads_A10_v5, up to a maximum of 2 full GPUs per virtual machine in the Standard_NV72ads_A10_v5.

 

 

 

In addition, this new generation is based on the latest AMD EPYC 74F3V (Milan) processors with a base frequency of 3.2 GHz and a peak of 4.0 GHz. This hardware configuration provides one of the best options to cover both the most basic needs of visualization and the most demanding ones.

 

 

 

If you need to set up a Linux environment with Nvads A10 v5 series, this article guides you step by step. The configuration is based on CentOS 7.9 as an operating system, it uses driver version 510.73 due to the requirements imposed by GRID version 14.1, and provides remote access via TurboVNC along with VirtualGL for 3D acceleration.

 

 

 

The URN of the exact image used is "OpenLogic:CentOS:7_9-gen2:latest". It is important to keep this in mind since there are multiple variants available in the Marketplace at this time. This guide is based on the configuration scripts used by Azure HPC On-Demand Platform but with updated drivers and software versions.

 

 

 

Preparing the operating system

 

 

 

First step will focus on updating the base image available in Azure and installing the basic dependencies: Linux kernel headers and Dynamic Kernel Module (DKMS) support. These packages are used by NVIDIA drivers to generate the necessary module and load it without having to modify the kernel. The kernel version used is 3.10.0-1160.76.1

 

 

 

sudo yum update -y

sudo yum install -y kernel-devel

#DKMS is only available on Fedora EPEL repos.

sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

sudo yum install -y dkms

sudo reboot

 

This restart is important to allow the operating system to apply the changes after the update and avoid later error. For example, the NVIDIA installer will not be able to find the kernel headers correctly automatically and you would need to manually specify them.

 

 

 

Installing NVIDIA GRID drivers

 

 

 

Since we are going to use NVIDIA's proprietary drivers, we need to prevent the kernel from loading the open source Nouveau drivers. It is possible to run the following as root or edit the file directly with your preferred text editor (i.e. nano, vim, etc.).

 

 

 

sudo su -

cat <<EOF >/etc/modprobe.d/nouveau.conf

Blacklist Nouveau

LBM-Nouveau Blacklist

EOF

exit

 

 

 

After that, we would continue installing the NVIDIA GRID drivers. It is very important to make use of the installer provided directly by Microsoft instead of those available on the NVIDIA website. Microsoft’s version already includes the required GRID licensing to be used in Azure configured.

 

 

 

If you use NVIDIA's own drivers you will have to configure a licensing server and acquire the corresponding licenses, something that does not make sense since they are already included in the price of the virtual machine.

 

 

 

wget -O NVIDIA-Linux-x86_64-grid.run https://download.microsoft.com/download/6/2/5/625e22a0-34ea-4d03-8738-a639acebc15e/NVIDIA-Linux-x86_64-510.73.08-grid-azure.run

chmod +x NVIDIA-Linux-x86_64-grid.run

sudo ./NVIDIA-Linux-x86_64-grid.run -s

 

Once successfully installed, NVIDIA GRID settings need to be modified. To do this we will use the sample file provided by NVIDIA.

 

 

 

sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf

 

The following changes will need to be made:

 

  • Comment on the FeatureType section as it is not required in this customized version of the drivers in Azure
  • Disable the licensing interface in nvidia-settings with EnableUI=FALSE as it is automatically managed in Azure.
  • Add IgnoreSP=FALSE as reflected in the official Azure documentation

 

sudo su -

cat <<EOF >>/etc/nvidia/gridd.conf

IgnoreSP=FALSE

EnableUI=FALSE

EOF

sed -i '/FeatureType=0/d' /etc/nvidia/gridd.conf

reboot

 

 

 

After rebooting, the kernel would use the newly installed drivers and we could check that the card is correctly configured.

 

nvidia-smi

 

 

 

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 510.73.08 driver version: 510.73.08 CUDA version: 11.6 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 NVIDIA A10-4Q On | 0000E7AB:00:00.0 Off | 0 |

| N/A N/A P8 N/A / N/A | 0MiB / 4096MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory || ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

 

 

Installing VNC Remote Access with TurboVNC and VirtualGL

 

Linux images in the Azure marketplace do not come by default with a graphical environment. It would be required to install both the X.org window manager and a desktop environment. In that case, we will use Xfce due to its low resource consumption, ideal for a remote work environment in the cloud.

 

 

 

sudo yum groupinstall -y "X Window system"

sudo yum groupinstall -y xfce

 

 

 

Once the graphical environment is installed, the next step will be to configure VNC access. We will use TurboVNC, an optimized VNC server and client for video and 3D environments. Its integration with VirtualGL allows us to have a robust and high-performance solution for this type of applications on any type of network.

 

 

 

sudo yum install -y https://jztkft.dl.sourceforge.net/project/turbovnc/3.0.1/turbovnc-3.0.1.x86_64.rpm

sudo wget --no-check-certificate "https://virtualgl.com/pmwiki/uploads/Downloads/VirtualGL.repo" -O /etc/yum.repos.d/VirtualGL.repo

sudo yum install -y VirtualGL turbojpeg xorg-x11-apps

 

 

 

To make sure that permission are correctly applied when configuring VirtualGL, it is necessary to stop the window manager and offload the kernel modules. If not, the setup wizard will notify you that changes won't take effect until you do it.

 

 

 

sudo service gdm stop

sudo rmmod nvidia_drm nvidia_modeset NVIDIA

sudo /usr/bin/vglserver_config -config +s +f -t

sudo service gdm start

 

After that, we would configure systemd to boot in graphical mode by default and, to avoid a restart, we start it directly in the current session.

 

 

 

sudo systemctl set-default graphical.target

sudo systemctl isolate graphical.target

 

The last step is to indicate what software we want to run when we establish a new connection through TurboVNC. In our case, we want a new Xfce desktop session to start working on our workstation.

 

 

 

cd $HOME

echo "xfce4-session" > ~/. Xclients

chmod a+x ~/. Xclients

 

All server side configuration has been completed. Next step is installing TurboVNC client on your local machine and connect to the IP or DNS associated with your virtual machine deployed in Azure. Make sure that your Network Security Groups applied to the subnet or VM network interface card are properly configured to gran you access.

 

 

 

You should see something similar to the following snapshot. Congratulations, your Linux workstation for 3D visualization is already configured. Next step will be to install the necessary applications for your scenario and make sure to execute them with VirtualGL.

 

 

 

mediumvv2px400.png.8f737692aaf542a98298ebc942e2257d.png

 

 

 

 

Recommended extra configuration

 

 

 

PCI Bus Update

 

If the virtual machine is restarted or redeployed to another host, PCI Hus identifier may vary. This will cause our graphics environment to not work properly because it can’t find the card. To avoid this situation, it is recommended to configure the following script that adjusts the BusPCI settings each time the virtual machine starts.

 

 

 

sudo su -

cat <<EOF >/etc/rc.d/rc3.d/busidupdate.sh

#!/bin/bash

BUSID=\$(nvidia-xconfig --query-gpu-info | awk '/PCI BusID/{print \$4}')

nvidia-xconfig --enable-all-gpus --allow-empty-initial-configuration -c /etc/X11/xorg.conf --virtual=1920x1200 --busid \$BUSID -s

sed -i '/BusID/a\ Option "HardDPMS" "false"' /etc/X11/xorg.conf

EOF

chmod +x /etc/rc.d/rc3.d/busidupdate.sh

/etc/rc.d/rc3.d/busidupdate.sh

exit

 

Create a vglrun alias

 

3D acceleration can be configured at the graphical environment level or at the application level. Using Xfce as a desktop environment does not require the first option, and we can dedicate all the resources of the GPU for our applications.

 

 

 

To ensure that applications make use of acceleration, it is necessary to execute them through the vglrun command. To make the process easier and make sure we use all the GPUs available on the node, this script generates an alias with the necessary configuration. To start an application, append vglrun at the beginning of the command and that’s all.

 

 

 

sudo su -

cat <<EOF >/etc/profile.d/vglrun.sh

#!/bin/bash

ngpu=\$(/usr/sbin/lspci | grep NVIDIA | wc -l)

alias vglrun='/usr/bin/vglrun -d :0.\$(( \${port:-0} % \${ngpu:-1}))'

EOF

exit

 

Increase the size of network buffers

 

The default Linux network device configuration may not provide optimal throughput (bandwidth) and latency for parallel work scenarios. That is why it is advisable to increase the size of the write and read buffers at the operating system level.

 

 

 

sudo su -

cat << EOF >>/etc/sysctl.conf

net.core.rmem_max=2097152

net.core.wmem_max=2097152

EOF

exit

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...