Guest HugoAffaticati Posted January 27, 2023 Posted January 27, 2023 By Hugo Affaticati – Technical Program Manager Useful resources: Information on the NC A100 v4-series: Microsoft Information on MIG: NVIDIA In this document, one will find the steps to run the MLPerf Inference v2.1 benchmarks for BERT, ResNet-50, RNN-T, and 3D-UNet on one of seven slices of NVIDIA-powered NC A100 v4-series Tensor Core GPUs with Multi-Instance GPU (MIG). Learn more about MIG on Azure and Azure’s submission to MLPerf Inference v2.1. Pre-requisites: Deploy and set up a virtual machine on Azure by following “Getting started with the NC A100 v4-series.” Set up the environment: Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v2.1 repository. The path for NC A100 v4-series (single node) is: cd /mnt/resource_nvme git clone GitHub - mlcommons/inference_results_v2.1 cd inference_results_v2.1/closed/Azure Create folders for the data and get the ResNet-50 data: export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch mkdir -p $MLPERF_SCRATCH_PATH mkdir $MLPERF_SCRATCH_PATH/data $MLPERF_SCRATCH_PATH/models $MLPERF_SCRATCH_PATH/preprocessed_data cd $MLPERF_SCRATCH_PATH/data && mkdir imagenet && cd imagenet In this imagenet folder download ImageNet Data available online and go back to the script. cd /mnt/resource_nvme/inference_results_v2.1/closed/Azure Do not create the MIG instance manually, the command “make prebuild” will do it. One change is needed prior to starting the container. Remove “--gpu all” in " --gpu all -e NVIDIA_MIG_CONFIG_DEVICES=all" on line 754 of the file Makefile. Enable MIG on all the GPUs (rebooting the VM may be needed), prebuild the container on all the instances, and get the rest of the datasets from inside the container. sudo nvidia-smi -mig 1 make prebuild MIG_CONF=ALL make download_data BENCHMARKS="resnet50 bert rnnt 3d-unet" make download_model BENCHMARKS="resnet50 bert rnnt 3d-unet" make preprocess_data BENCHMARKS="resnet50 bert rnnt 3d-unet" One needs to register the system and generate the configuration files before running the benchmarks. python3 -m scripts.custom_systems.add_custom_system Give a name and accept to generate the customed configuration files. Finally, adjust the values of the configuration files located in configs/[benchmark]/[scenario]/custom.py by using the values suggested by NVIDIA under “A100_PCIe_80GB_MIG_1x1g10gb” located in /mnt/resource_nvme/inference_results_v2.1/closed/NVIDIA/configs/[benchmark]/[scenario]/__init__.py This will allow you to run the benchmarks on a single slice of MIG. You can finally build the container: cd /mnt/resource_nvme/inference_results_v2.1/closed/Azure make build Run the benchmark Finally, run the benchmark with the make run command, an example is given below. The value is only correct if the result is “VALID”, modify the value in the config files if the result is “INVALID”. make run RUN_ARGS="--benchmarks=bert --scenarios=offline --config_ver=default,high_accuracy,triton,high_accuracy_triton" Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.