K
Krishnaprasad_Hande
Today, we are announcing the general availability of Azure confidential virtual machines (VMs) with NVIDIA H100 Tensor core GPUs. These VMs combine the hardware-based data-in-use protection capabilities of 4th generation AMD EPYCTM processor based confidential VMs with the performance of NVIDIA H100 Tensor Core GPUs. By enabling confidential computing on GPUs, Azure offers customers more options and flexibility to run their workload securely and efficiently on the cloud. These VMs are ideal for inferencing, fine-tuning or training small-to-medium sized models such as Whisper, Stable diffusion and its variants (SDXL, SSD), and language models such as Zephyr, Falcon, GPT2, MPT, Llama2, Wizard and Xwin.
Azure NCC H100 v5 virtual machines are currently available in East US2 and West Europe regions.
Figure 1. Simplified NCC H100 v5 architecture
Hardware partner endorsements
We are grateful to our hardware partners for their support and endorsements.
“The expanding landscape of innovations, particularly generative AI, are creating boundless opportunities for enterprises and developers. NVIDIA’s accelerated computing platform equips pioneers like Azure to boost performance for AI workloads while maintaining robust security through confidential computing.” Daniel Rohrer, VP of software product security, architecture and research, NVIDIA.
"AMD is a pioneer in confidential computing, with a long-standing collaboration with Azure to enable numerous confidential computing services powered by our leading AMD EPYC processors. We are now expanding our confidential computing capabilities into AI workloads with the new Azure confidential VMs with NVIDIA H100 Tensor Core GPUs and 4th Gen AMD EPYC CPUs, the industry's first offering of a confidential AI service. We are excited to expand our confidential computing offerings with Azure to address demands of AI workloads." Ram Peddibhotla, corporate vice president, product management, cloud business, AMD.
Customer use cases and feedback
Some examples of workloads our customers have experimented with during the preview and planning further with the power of Azure NCC H100 v5 GPU virtual machine are:
“RBC has been working very closely with Microsoft on confidential computing initiatives since the early days of technology availability within Azure,” said Justin Simonelis, Director, Service Engineering and Confidential Computing, RBC. “We’ve leveraged the benefits of confidential computing and integrated it into our own data clean room platform known a Arxis. As we continue to develop our platform capabilities, we fully recognize the importance of privacy preserving machine learning inference and training to protect sensitive customer data within GPUs and look forward to leveraging Azure confidential VMs with NVIDIA H100 Tensor Core GPUs.”
Performance insights
Azure confidential VMs with NVIDIA H100 Tensor core GPUs offer best-in-class performance for inferencing small-to-medium sized models while protecting code and data throughout their lifecycle. We have benchmarked these VMs across a variety of models using vLLM.
The table below shows configuration for the tests:
The figure above shows the overheads of confidential computing, with and without CUDA graph enabled. For most models, the overheads are negligible. For smaller models, the overheads are higher due to increased latency of encrypting PCIe traffic and kernel invocations. Increasing the batch size or input token length is a viable strategy to mitigate confidential computing overhead.
Learn more
Continue reading...
Azure NCC H100 v5 virtual machines are currently available in East US2 and West Europe regions.
Figure 1. Simplified NCC H100 v5 architecture
Hardware partner endorsements
We are grateful to our hardware partners for their support and endorsements.
“The expanding landscape of innovations, particularly generative AI, are creating boundless opportunities for enterprises and developers. NVIDIA’s accelerated computing platform equips pioneers like Azure to boost performance for AI workloads while maintaining robust security through confidential computing.” Daniel Rohrer, VP of software product security, architecture and research, NVIDIA.
"AMD is a pioneer in confidential computing, with a long-standing collaboration with Azure to enable numerous confidential computing services powered by our leading AMD EPYC processors. We are now expanding our confidential computing capabilities into AI workloads with the new Azure confidential VMs with NVIDIA H100 Tensor Core GPUs and 4th Gen AMD EPYC CPUs, the industry's first offering of a confidential AI service. We are excited to expand our confidential computing offerings with Azure to address demands of AI workloads." Ram Peddibhotla, corporate vice president, product management, cloud business, AMD.
Customer use cases and feedback
Some examples of workloads our customers have experimented with during the preview and planning further with the power of Azure NCC H100 v5 GPU virtual machine are:
- Confidential inference on audio to text (Whisper models)
- Video input to detect anomaly behavior for incident prevention - leveraging confidential computing to meet data privacy.
- Stable diffusion with privacy sensitive design data in the automobile industry (inference & training)
- Multi-party clean rooms to run analytical tasks against billions of transactions and terabytes of data of financial institute and its subsidiaries.
“RBC has been working very closely with Microsoft on confidential computing initiatives since the early days of technology availability within Azure,” said Justin Simonelis, Director, Service Engineering and Confidential Computing, RBC. “We’ve leveraged the benefits of confidential computing and integrated it into our own data clean room platform known a Arxis. As we continue to develop our platform capabilities, we fully recognize the importance of privacy preserving machine learning inference and training to protect sensitive customer data within GPUs and look forward to leveraging Azure confidential VMs with NVIDIA H100 Tensor Core GPUs.”
Performance insights
Azure confidential VMs with NVIDIA H100 Tensor core GPUs offer best-in-class performance for inferencing small-to-medium sized models while protecting code and data throughout their lifecycle. We have benchmarked these VMs across a variety of models using vLLM.
The table below shows configuration for the tests:
VM Configuration | vCPUs – 40 cores GPU - 1 Memory – 320GB |
Operating System | Ubuntu 22.04.4 LTS (6.5.0-1023-azure) |
GPU driver version | 550.90.07 |
GPU vBIOS version | 96.00.88.00.11 |
The figure above shows the overheads of confidential computing, with and without CUDA graph enabled. For most models, the overheads are negligible. For smaller models, the overheads are higher due to increased latency of encrypting PCIe traffic and kernel invocations. Increasing the batch size or input token length is a viable strategy to mitigate confidential computing overhead.
Learn more
Continue reading...