Guest jangelfdez Posted October 24, 2022 Posted October 24, 2022 If you need a remote workstation with graphical acceleration for 3D visualization, Azure has several options available. From the original NV series, with the NVIDIA Tesla M60, to the fifth generation of the family with the NVads A10 v5 series based on the NVIDIA A10 cards. This series is the first one to introduce support for the use of partitioned NVIDIA GPUs with a minimum of 1/6 of the GPU resources in the smaller version with the Standard_NV6ads_A10_v5, up to a maximum of 2 full GPUs per virtual machine in the Standard_NV72ads_A10_v5. In addition, this new generation is based on the latest AMD EPYC 74F3V (Milan) processors with a base frequency of 3.2 GHz and a peak of 4.0 GHz. This hardware configuration provides one of the best options to cover both the most basic needs of visualization and the most demanding ones. If you need to set up a Linux environment with Nvads A10 v5 series, this article guides you step by step. The configuration is based on CentOS 7.9 as an operating system, it uses driver version 510.73 due to the requirements imposed by GRID version 14.1, and provides remote access via TurboVNC along with VirtualGL for 3D acceleration. The URN of the exact image used is "OpenLogic:CentOS:7_9-gen2:latest". It is important to keep this in mind since there are multiple variants available in the Marketplace at this time. This guide is based on the configuration scripts used by Azure HPC On-Demand Platform but with updated drivers and software versions. Preparing the operating system First step will focus on updating the base image available in Azure and installing the basic dependencies: Linux kernel headers and Dynamic Kernel Module (DKMS) support. These packages are used by NVIDIA drivers to generate the necessary module and load it without having to modify the kernel. The kernel version used is 3.10.0-1160.76.1 sudo yum update -y sudo yum install -y kernel-devel #DKMS is only available on Fedora EPEL repos. sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo yum install -y dkms sudo reboot This restart is important to allow the operating system to apply the changes after the update and avoid later error. For example, the NVIDIA installer will not be able to find the kernel headers correctly automatically and you would need to manually specify them. Installing NVIDIA GRID drivers Since we are going to use NVIDIA's proprietary drivers, we need to prevent the kernel from loading the open source Nouveau drivers. It is possible to run the following as root or edit the file directly with your preferred text editor (i.e. nano, vim, etc.). sudo su - cat <<EOF >/etc/modprobe.d/nouveau.conf Blacklist Nouveau LBM-Nouveau Blacklist EOF exit After that, we would continue installing the NVIDIA GRID drivers. It is very important to make use of the installer provided directly by Microsoft instead of those available on the NVIDIA website. Microsoft’s version already includes the required GRID licensing to be used in Azure configured. If you use NVIDIA's own drivers you will have to configure a licensing server and acquire the corresponding licenses, something that does not make sense since they are already included in the price of the virtual machine. wget -O NVIDIA-Linux-x86_64-grid.run https://download.microsoft.com/download/6/2/5/625e22a0-34ea-4d03-8738-a639acebc15e/NVIDIA-Linux-x86_64-510.73.08-grid-azure.run chmod +x NVIDIA-Linux-x86_64-grid.run sudo ./NVIDIA-Linux-x86_64-grid.run -s Once successfully installed, NVIDIA GRID settings need to be modified. To do this we will use the sample file provided by NVIDIA. sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf The following changes will need to be made: Comment on the FeatureType section as it is not required in this customized version of the drivers in Azure Disable the licensing interface in nvidia-settings with EnableUI=FALSE as it is automatically managed in Azure. Add IgnoreSP=FALSE as reflected in the official Azure documentation sudo su - cat <<EOF >>/etc/nvidia/gridd.conf IgnoreSP=FALSE EnableUI=FALSE EOF sed -i '/FeatureType=0/d' /etc/nvidia/gridd.conf reboot After rebooting, the kernel would use the newly installed drivers and we could check that the card is correctly configured. nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.73.08 driver version: 510.73.08 CUDA version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10-4Q On | 0000E7AB:00:00.0 Off | 0 | | N/A N/A P8 N/A / N/A | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory || ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Installing VNC Remote Access with TurboVNC and VirtualGL Linux images in the Azure marketplace do not come by default with a graphical environment. It would be required to install both the X.org window manager and a desktop environment. In that case, we will use Xfce due to its low resource consumption, ideal for a remote work environment in the cloud. sudo yum groupinstall -y "X Window system" sudo yum groupinstall -y xfce Once the graphical environment is installed, the next step will be to configure VNC access. We will use TurboVNC, an optimized VNC server and client for video and 3D environments. Its integration with VirtualGL allows us to have a robust and high-performance solution for this type of applications on any type of network. sudo yum install -y https://jztkft.dl.sourceforge.net/project/turbovnc/3.0.1/turbovnc-3.0.1.x86_64.rpm sudo wget --no-check-certificate "https://virtualgl.com/pmwiki/uploads/Downloads/VirtualGL.repo" -O /etc/yum.repos.d/VirtualGL.repo sudo yum install -y VirtualGL turbojpeg xorg-x11-apps To make sure that permission are correctly applied when configuring VirtualGL, it is necessary to stop the window manager and offload the kernel modules. If not, the setup wizard will notify you that changes won't take effect until you do it. sudo service gdm stop sudo rmmod nvidia_drm nvidia_modeset NVIDIA sudo /usr/bin/vglserver_config -config +s +f -t sudo service gdm start After that, we would configure systemd to boot in graphical mode by default and, to avoid a restart, we start it directly in the current session. sudo systemctl set-default graphical.target sudo systemctl isolate graphical.target The last step is to indicate what software we want to run when we establish a new connection through TurboVNC. In our case, we want a new Xfce desktop session to start working on our workstation. cd $HOME echo "xfce4-session" > ~/. Xclients chmod a+x ~/. Xclients All server side configuration has been completed. Next step is installing TurboVNC client on your local machine and connect to the IP or DNS associated with your virtual machine deployed in Azure. Make sure that your Network Security Groups applied to the subnet or VM network interface card are properly configured to gran you access. You should see something similar to the following snapshot. Congratulations, your Linux workstation for 3D visualization is already configured. Next step will be to install the necessary applications for your scenario and make sure to execute them with VirtualGL. Recommended extra configuration PCI Bus Update If the virtual machine is restarted or redeployed to another host, PCI Hus identifier may vary. This will cause our graphics environment to not work properly because it can’t find the card. To avoid this situation, it is recommended to configure the following script that adjusts the BusPCI settings each time the virtual machine starts. sudo su - cat <<EOF >/etc/rc.d/rc3.d/busidupdate.sh #!/bin/bash BUSID=\$(nvidia-xconfig --query-gpu-info | awk '/PCI BusID/{print \$4}') nvidia-xconfig --enable-all-gpus --allow-empty-initial-configuration -c /etc/X11/xorg.conf --virtual=1920x1200 --busid \$BUSID -s sed -i '/BusID/a\ Option "HardDPMS" "false"' /etc/X11/xorg.conf EOF chmod +x /etc/rc.d/rc3.d/busidupdate.sh /etc/rc.d/rc3.d/busidupdate.sh exit Create a vglrun alias 3D acceleration can be configured at the graphical environment level or at the application level. Using Xfce as a desktop environment does not require the first option, and we can dedicate all the resources of the GPU for our applications. To ensure that applications make use of acceleration, it is necessary to execute them through the vglrun command. To make the process easier and make sure we use all the GPUs available on the node, this script generates an alias with the necessary configuration. To start an application, append vglrun at the beginning of the command and that’s all. sudo su - cat <<EOF >/etc/profile.d/vglrun.sh #!/bin/bash ngpu=\$(/usr/sbin/lspci | grep NVIDIA | wc -l) alias vglrun='/usr/bin/vglrun -d :0.\$(( \${port:-0} % \${ngpu:-1}))' EOF exit Increase the size of network buffers The default Linux network device configuration may not provide optimal throughput (bandwidth) and latency for parallel work scenarios. That is why it is advisable to increase the size of the write and read buffers at the operating system level. sudo su - cat << EOF >>/etc/sysctl.conf net.core.rmem_max=2097152 net.core.wmem_max=2097152 EOF exit Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.