Announcing the Availability of Phi-3.5-MoE in Azure AI Studio and GitHub

martincai · Sep 27, 2024

In August 2024, we welcomed the latest addition to the Phi model family, Phi-3.5-MoE, a Mixture-of-Experts (MoE) model featuring 16 experts and 6.6B active parameters. The introduction of this model has been met with enthusiasm and praise from our users, who have acknowledged its competitive performance, multi-lingual capability, robust safety measures, and excelling over larger models while upholding Phi models efficacy.

Today, we are proud to announce that the Phi-3.5-MoE model is now available through Serverless API deployment method in Azure AI Studio (Figure 1) and GitHub (Figure 2). By providing access to the model through Serverless API, we aim to simplify the deployment process and reduce the overhead associated with managing infrastructure. This advancement represents a significant step forward in making our state-of-the-art deep learning models more accessible and easier to integrate into various applications for users and developers everywhere. Key benefits include:

Scalability: Easily scale your usage based on demand without worrying about underlying hardware constraints. Phi-3.5-MoE and other Phi-3.5 models are available in East US 2, East US, North Central US, South Central US, West US 3, West US, and Sweden Central regions.
Cost Efficiency: Pay only for the resource you use, ensuring cost-effective operation, at $0.00013 per 1K input tokens and $0.00052 per 1K output tokens.
Ease of Integration: Seamlessly integrate Phi-3.5-MoE into your existing workflows and applications with minimal effort.

Please follow the quick start guide on how to deploy and use the Phi family of models in Azure AI Studio and GitHub from our Phi-3 Cookbook.

Figure 1: Deploy the Phi-3.5-MoE model using Serverless API in Azure AI Studio.

Figure 2: The Phi-3.5-MoE playground experience in GitHub.

While we celebrate the release of Phi-3.5-MoE, we want to take this opportunity to highlight the complexities in training such models. Mixture of Experts (MoE) models can scale efficiently without a linear increase in computation. For instance, the Phi-3.5-MoE model has 42B total parameters but activates only 6.6B of them, utilizing 16 expert blocks with just 2 experts selected per token. How to leverage these parameters effectively has been proven challenging, leading to only marginal improvements in quality despite increasing the number of parameters. Additionally, making each expert specialize in specific tasks has been difficult; conventional training methods resulted in similar training across all 16 experts, limiting quality enhancement for diverse tasks.

To build a state-of-the-art MoE model, our Phi team developed a new training method called GRIN (GRadient INformed) MoE to improve the use of parameters and expert specialization. The Phi-3.5-MoE model, trained using this method, demonstrates clear expert specialization patterns, with experts clustering around similar tasks such as STEM, Social Sciences, and Humanities. This approach achieved significantly higher quality gains compared to traditional methods. As shown in Figure 3, the model can utilize different sets of parameters for various tasks. This specialization enables efficient use of the large parameter set by activating only the most relevant ones for each task.

Figure 3: Expert routing patterns for different tasks. GRIN MoE training demonstrates strong specialization among experts.

The model excels in real-world and academic benchmarks, surpassing several leading models in various tasks, including math, reasoning, multi-lingual tasks, and code generation. Figure 4 below is an example of a solution generated by Phi-3.5-MoE in response to a GAOKAO 2024 math question. The model effectively breaks down the complex math problem, reasons through it, and arrives at the correct answer.

Figure 4: An answer from Phi-3.5-MoE for a GAOKAO 2024 math question.

The Phi-3.5-MoE model was evaluated across various academic benchmarks (see Figure 5). Compared with several open-source and closed-source models, Phi-3.5-MoE outperforms the latest models, such as Mistral-Nemo-12B, Llama-3.1-8B, and Gemma-2-9B, despite utilizing fewer active parameters. It also demonstrates comparable or slightly superior performance to Gemini-1.5-Flash, one of the widely used closed models.

Figure 5: Phi-3.5-MoE evaluation result across multiple academic benchmarks.

We invite developers, data scientists, and AI enthusiasts to explore the specialized capabilities of Phi-3.5-MoE through Azure AI Studio. Whether you're creating innovative applications or enhancing existing solutions, Phi-3.5-MoE provides the flexibility and power you need. For the latest information on the Phi family of models, please visit the Phi open models page.

Continue reading...

Announcing the Availability of Phi-3.5-MoE in Azure AI Studio and GitHub

martincai