General Availability: Azure confidential VMs with NVIDIA H100 Tensor Core GPUs

Krishnaprasad_Hande · Sep 24, 2024

Today, we are announcing the general availability of Azure confidential virtual machines (VMs) with NVIDIA H100 Tensor core GPUs. These VMs combine the hardware-based data-in-use protection capabilities of 4th generation AMD EPYCTM processor based confidential VMs with the performance of NVIDIA H100 Tensor Core GPUs. By enabling confidential computing on GPUs, Azure offers customers more options and flexibility to run their workload securely and efficiently on the cloud. These VMs are ideal for inferencing, fine-tuning or training small-to-medium sized models such as Whisper, Stable diffusion and its variants (SDXL, SSD), and language models such as Zephyr, Falcon, GPT2, MPT, Llama2, Wizard and Xwin.

Azure NCC H100 v5 virtual machines are currently available in East US2 and West Europe regions.

Figure 1. Simplified NCC H100 v5 architecture

Hardware partner endorsements

We are grateful to our hardware partners for their support and endorsements.

“The expanding landscape of innovations, particularly generative AI, are creating boundless opportunities for enterprises and developers. NVIDIA’s accelerated computing platform equips pioneers like Azure to boost performance for AI workloads while maintaining robust security through confidential computing.” Daniel Rohrer, VP of software product security, architecture and research, NVIDIA.

"AMD is a pioneer in confidential computing, with a long-standing collaboration with Azure to enable numerous confidential computing services powered by our leading AMD EPYC processors. We are now expanding our confidential computing capabilities into AI workloads with the new Azure confidential VMs with NVIDIA H100 Tensor Core GPUs and 4th Gen AMD EPYC CPUs, the industry's first offering of a confidential AI service. We are excited to expand our confidential computing offerings with Azure to address demands of AI workloads." Ram Peddibhotla, corporate vice president, product management, cloud business, AMD.

Customer use cases and feedback

Some examples of workloads our customers have experimented with during the preview and planning further with the power of Azure NCC H100 v5 GPU virtual machine are:

Confidential inference on audio to text (Whisper models)
Video input to detect anomaly behavior for incident prevention - leveraging confidential computing to meet data privacy.
Stable diffusion with privacy sensitive design data in the automobile industry (inference & training)
Multi-party clean rooms to run analytical tasks against billions of transactions and terabytes of data of financial institute and its subsidiaries.

	Advancing AI securely is core to our mission, and we were pleased to collaborate with Azure confidential computing to validate and test Confidential Inference for our audio-to-text Whisper models on Nvidia GPUs. Matthew Knight, Head of Security, OpenAI
	F5 can leverage Microsoft Azure Confidential VMs with NVIDIA H100 Tensor Core GPUs to develop and deploy GenAI models. While the AI model learns from private data, the underlying information remains encrypted within the Trusted Execution Environments (TEEs). This solution allows us to build advanced AI-powered security solutions, while ensuring confidentiality of the data our models are analyzing. This bolsters customer trust and strengthens our position as a leader in secure network protection. Azure confidential computing helps us build a better, more secure, and more innovative digital world. Arul Elumalai, SVP & GM, Distributed Cloud Platform & Security Services, F5, Inc.
	ServiceNow works closely with Microsoft, NVIDIA, and Opaque to put AI to work for people and deliver great experiences to both customers and employees on the Now Platform. The partnership between Opaque and Microsoft allows us to quickly deploy and leverage the power of Azure confidential VMs with NVIDIA H100 Tensor Core GPUs to deliver confidential AI with verifiable data privacy and security. Kellie Romack, Chief Digital Information Officer, ServiceNow
	The integration of the Opaque platform with Azure confidential VMs with NVIDIA H100 Tensor Core GPUs to create Confidential AI makes AI adoption faster and easier by helping to eliminate data sovereignty and privacy concerns. Confidential AI is the future of AI deployments, and with Opaque, Microsoft Azure, and NVIDIA, we're making this future a reality today. Aaron Fulkerson, CEO, Opaque Systems
	Leveraging the power of the preview of the Azure confidential VMs with NVIDIA H100 Tensor Core GPUs, our team has successfully integrated 'Constellation', a Kubernetes distribution focused on Confidential Computing, with GPU capabilities. This allows customers to lift and shift even sophisticated AI stacks to Azure confidential computing. With 'Continuum AI', we've created a framework for the end-to-end confidential serving of LLMs that ensures the utmost privacy of data, setting a new standard in AI inference solutions. We are thrilled to partner with Azure confidential computing to uncover the transformative potential of Confidential Computing, especially in the era of generative AI. Felix Schuster, CEO and co-founder, Edgeless Systems
	Cyborg is excited to collaborate with Azure in previewing Azure confidential VMs with NVIDIA H100 Tensor Core GPUs. This partnership allows us to leverage GPU acceleration for our Confidential Vector Search algorithm, maintaining the highest degree of security while readying it for the stringent performance requirements of AI applications. We eagerly await the general availability of this VM SKU as we prepare to deploy our production-grade service. Nicolas Dupont, CEO, Cyborg

“RBC has been working very closely with Microsoft on confidential computing initiatives since the early days of technology availability within Azure,” said Justin Simonelis, Director, Service Engineering and Confidential Computing, RBC. “We’ve leveraged the benefits of confidential computing and integrated it into our own data clean room platform known a Arxis. As we continue to develop our platform capabilities, we fully recognize the importance of privacy preserving machine learning inference and training to protect sensitive customer data within GPUs and look forward to leveraging Azure confidential VMs with NVIDIA H100 Tensor Core GPUs.”

Performance insights

Azure confidential VMs with NVIDIA H100 Tensor core GPUs offer best-in-class performance for inferencing small-to-medium sized models while protecting code and data throughout their lifecycle. We have benchmarked these VMs across a variety of models using vLLM.

The table below shows configuration for the tests:

VM Configuration	vCPUs – 40 cores GPU - 1 Memory – 320GB
Operating System	Ubuntu 22.04.4 LTS (6.5.0-1023-azure)
GPU driver version	550.90.07
GPU vBIOS version	96.00.88.00.11

The figure above shows the overheads of confidential computing, with and without CUDA graph enabled. For most models, the overheads are negligible. For smaller models, the overheads are higher due to increased latency of encrypting PCIe traffic and kernel invocations. Increasing the batch size or input token length is a viable strategy to mitigate confidential computing overhead.

Learn more

General Availability: Azure confidential VMs with NVIDIA H100 Tensor Core GPUs

Krishnaprasad_Hande