Guest Kent Altena Posted April 11, 2023 Posted April 11, 2023 Used in many industries, including engineering, mathematics, and finance, MatLab is a proprietary programming language and multi-paradigm numerical computing environment. With the increasing complexity of data analysis, simulation, and modeling tasks, the performance of MatLab plays a crucial role in the speed and accuracy of these operations. Microsoft Azure offers a cloud-based platform that provides virtual machines (VMs) to run MatLab. However, selecting the right VM SKU can be difficult, and choosing an incorrect one can lead to suboptimal performance and potentially higher costs. In this blog article, we'll discuss how processor selection and other factors may affect MatLab's performance and how to choose the right Azure VM SKU to achieve the best performance for your MatLab workloads. We'll also explore some best practices to optimize MatLab performance on Azure VMs. For background, MatLab, short for Matrix Laboratory, is a numerical computing environment developed by MathWorks. MatLab provides a wide range of tools for performing calculations, data analysis, visualization, and simulation tasks. It offers a high-level language that allows users to express complex mathematical computations easily and efficiently. With a vast library of built-in functions and toolboxes, MatLab provides a platform for solving complex engineering, scientific, and financial problems. MatLab's user-friendly interface, combined with its powerful computing capabilities, has made it a popular choice for researchers, engineers, and scientists across various industries. The default answer to many organizations is to run MatLab calculations or simulations directly on the end-user workstations. However, for many reasons, this can be suboptimal as it leads to over-provisioning the capability of the desktop environment, especially if using Terminal Services or Virtual Desktop Infrastructure. I worked last year with a very large Non-Government Organization (NGO) nonprofit, which had such an environment. Their VDI environment was a difficult to manage RDS environment with users sharing access to large compute nodes with capabilities sufficient to run their data scientists' jobs. Offloading MatLab Workload to Dedicated Compute Nodes Utilizing MatLab Parallel Compute Services and its native HPC Pack integration allows the end user to optimize or right-size the front end to run the desktop client and optimize the back end to be large enough to handle simulations. Offloading MatLab calculations to HPC Pack can significantly improve its performance and scalability. The HPC Pack provides a powerful platform for running parallel and distributed MatLab applications across a cluster of machines. Additionally, HPC Pack offers features such as job scheduling, data management, and native cloud orchestration to "AutoGrowShrink" to minimize compute costs when no jobs are in the queue. By utilizing the HPC Pack, users can take advantage of the full power of their cluster environment, enabling faster and more efficient data processing and analysis. The above NGO fixed their front end by implementing Azure Virtual Desktop (AVD) and implemented their compute infrastructure on HC44rs SKUs. For the Compute Nodes running the calculations, there are several recommendations for performance. Primarily, MatLab is a compute-intensive program that requires enough memory to handle the size of the models. Being a multi-threaded application, MatLab benefits from having several physical cores available. In general, hyperthreading does not benefit the calculations once a sufficient number of cores are present. For an optimal memory-to-core ratio, it is important to know the size of companies model as any paging activity will seriously degrade performance. Local disk performance can also affect simulation performance as MatLab writes the results back out to disk. The general Azure recommendation is to utilize the local ephemeral disk for this transient data and ensure the Server Message Block (SMB) Share location is performant. Benchmarking MatLab Workload MATLAB provides a built-in benchmarking utility called bench, which measures the execution time of specific MatLab functions and compares them against standard reference values. The bench function evaluates different types of computation and tests various combinations of data sizes and algorithms to provide a comprehensive performance profile. The benchmarking process helps identify performance bottlenecks and guide optimization efforts, such as parallelizing computations or optimizing code. Use the MATLAB function timeit to help produce reliable and repeatable performance benchmarks. Use gputimeit to benchmark GPU code. Utilizing this bench, you can evaluate potential Azure SKUs in comparison to other Virtual Machine (VM) SKUs. From a methodology standpoint, I ran the same Windows 2019 OS with the latest patches and MatLab version across all likely HPC VM SKUs. I disabled hyperthreading for any General Purpose SKU VM families utilizing metatags. I ran the benchmark command 3 times on each VM family and averaged the result. If a result was dramatically out of range in comparison to the other two, I threw out the bad result and ran the result one additional time. In each case, we used the local ephemeral drive to run the MatLab bench command. Azure VMs being Benchmarked: VM Name HC44rs HB120rs_v3 HB120rs_v2 D64ds_v5 D64ads_v5 Number of pCPUs 44 (Constrained Core 16, 32 options available) 120 (Constrained Core 16, 32, 64, 96 options available) 120 (Constrained Core 16, 32, 64, 96 options available) 32 32 Processor Intel Xeon Platinum 8168 AMD EPYC 7V73X CPU cores (“Milan-X”) AMD EPYC 7742 CPU cores Intel® Xeon® Platinum 8370C (Ice Lake) AMD's EPYC 7763v CPU Cores Peak CPU Frequency 3.70 GHz 3.5 GHz 3.4 GHz 3.5 GHz 3.5 GHz RAM per VM 352 GB 448 GB 456 GB 256 GB 256 GB RAM per core 8 GB (22, 11GB) 3.75 GB (28, 14, 7, 4.6 GB) 3.8 GB (28, 14, 7, 4.6 GB) 8 GB 8 GB Memory B/W per core 4.3 GB/s 5.25 GB/s 2.9 GB/s 4.26 GB/s 4.26 GB/s L3 Cache per VM 33MB 768MB 256MB 48MB 256MB Attached Disk 1 x 700MB NVMe 2 x 0.9 TB NVMe 1 x 0.9 TB NVMe 2400 SSD 2400 SSD Disk per Core 15.9GB (43.8, 21.8) 15GB (113, 56, 28, 19) 7.5 GB (56, 28, 14, 9) 75GB 75GB Accelerated Networking Yes Yes Yes Yes Yes MatLab Benchmark Results: VM SKU MatLAB: LU MatLAB: FFT MatLAB: ODE MatLAB: Sparse HC44rs 0.2121 0.6646 0.2604 0.5576 HB120rs_v3 0.2236 0.401 0.2082 1.3275 HB120rs_v2 0.2309 0.3290 0.2482 1.5880 D64ds_v5 0.1697 0.23 0.1879 0.4406 D64ads_v5 0.2106 0.2809 0.1948 1.1102 For an explanation of what the columns are, I refer to the MatLab Benchmark page: LU (Lower-Upper Decomposition) Benchmark: The LU benchmark tests the performance of MATLAB for the lower-upper decomposition of large matrices. This benchmark involves factoring a matrix into lower and upper triangular matrices using different algorithms. Performance Factors: Floating-point, regular memory access FFT (Fast Fourier Transform) Benchmark: The FFT benchmark tests the performance of MATLAB for computing the fast Fourier transform of large data sets. This benchmark involves transforming a time-domain signal into its frequency-domain representation. The results of the FFT benchmark are influenced by the size of the input data set and the complexity of the signal being transformed. Performance Factors: Floating-point, irregular memory access ODE (Ordinary Differential Equation) Benchmark: The ODE benchmark tests the performance of MATLAB for solving systems of ordinary differential equations. This benchmark involves simulating the behavior of a physical system over time using differential equations. The results of the ODE benchmark are influenced by the complexity of the system being modeled and the accuracy of the numerical methods used to solve the equations. Performance Factors: Data structures and MATLAB function files, Disk Performance Sparse Benchmark: The Sparse benchmark tests the performance of MATLAB for manipulating sparse matrices. This benchmark involves performing operations on matrices that have a large number of zero elements. The results of the Sparse benchmark are influenced by the size and sparsity of the input matrix, as well as the specific operation being performed. Performance Factors: Mixed integer and floating-point Performance Comparison: Utilizing HC44rs as a performance baseline, a result of 1.50 would be 150% of the performance of HC44rs Result. You may notice a third column for HB120rs_v3 for AVX2. There is some belief within MatLab circles that MatLab is "crippled" on AMD processors. That was not my experience. I tested the supposition by forcing MatLab into MKL Debug mode. I created an MS-DOS batch file to launch MatLab in AVX2 Mode @echo off set MKL_DEBUG_CPU_TYPE=5 matlab.exe While performance was slightly higher (roughly 1-5% faster), it was within the margin of error for the result and was largely proven unnecessary. Conclusion: MatLab is a powerful computational tool used widely within Financial Services Industry (FSI) specifically. However, to achieve optimal performance and efficiency, it's crucial to understand the factors that affect MatLab's performance and how to optimize the workload for the hardware environment. Choosing the right Azure VM SKU, offloading computations to HPC Pack, benchmarking workloads, and optimizing MatLab code are all effective ways to improve MatLab's performance and scalability. Understanding your technical requirements and requirements for the computational environment will lead you to a specific SKU and whether or not to purchase a cloud savings plan or reserved instance for a portion of them. By following these best practices, MatLab users can reduce processing time, enhance data analysis and simulations, and ultimately improve their productivity and decision-making. Whether running MatLab on-premises or in the cloud, optimizing its performance is critical for data scientists' satisfaction and delivering results faster. Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.