N
nitya
Building generative AI applications starts with model selection and picking the right model to suit your application needs. The Azure AI Model Catalog offers over 1.78K models, including foundation models from core partners and nearly 1.6K open-source models from the Hugging Face community. This post is part of a monthly series to raise awareness of new models added to the Hugging Face collection on Azure. Check out the previous Hugging Face Models roundup post here.
The Hugging Face model hub has over 1M models. We select ~20 models each month to add to Azure based on feedback from our customers and developer community. Want to request a specific Hugging Face model be added? Make the request in just 3 steps:
- Search the Hugging Face Hub for the desired model - click to view its Model Card.
- Click the “Deploy” dropdown on that page - select the Azure ML option.
- Look for a “Request to add” button in the pop-up dialog and complete the flow.
If the model already exists in the Azure AI model catalog, you will see a “Go to model in Azure ML” button that should direct you to the model card on Azure AI Studio.
18 New Hugging Face Models Added in September
September saw the addition of 18 new models into the Hugging Face Collection on Azure AI. Models included community-created variants of popular base models from Meta’s Llama family (LLM), Microsoft’s Phi-3 family (SLM) and more. We also saw a number of these fine-tuned models were “made with Unsloth” - we’ll talk about this in a bit. First, let’s review the models added and highlight any notable features about them.
Note: These community-created models are suitable for research & prototyping but will require additional assessment for use in production. Read the model card (linked to each) for usage guidance and limitations and conduct your own quality and safety evaluations to assess them for your specific application scenario.
# | Model Name · Inference Task | Notable Features |
01 | Groq/Llama-3-Groq-70B-Tool-Use · Text Generation | Fine-tuned for tool use · 90.76% accuracy on BFCL (best in 70B class) · Meta-Llama-3-70B based |
02 | Groq/Llama-3-Groq-8B-Tool-Use · Text Generation | Fine-tuned for tool use · 89.06% accuracy on BFCL (best in 8B class) · Meta-Llama-3-8B based |
03 | LenguajeNaturalAI/leniachat-qwen2-1.5B-v0 · Text Generation · Spanish | Fine-tuned for Spanish users · Trained exclusively in Spanish for high-quality chat, instructions · Qwen/Qwen2-1.5B based |
04 | gokaygokay/Flux-Prompt-Enhance · Text-to-Text Generation · | Create enhanced prompts for image creation with Flux models · Creator Blog · google-t4/t5-base based |
05 | Ba2han/Llama-Phi-3_DoRA · Text Generation · made with unsloth | Trained on fltered versions of tagged datasets and llama-3-70B generated examples · Highest on MMLU, Winogrande for its class · microsoft/phi-3-mini |
06 | unsloth/Phi-3.5-mini-instruct · Text Generation · made with unsloth | 2X faster fine-tuning · 50% less memory use · Notebook (add data) · microsoft/phi-3.5-mini |
07 | cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated · Text Generation · made with unsloth | Filtered dataset to remove alignment/bias · Uncensored model (see: research) · Needs dev effort for alignment, responsible AI · unsloth/Phi-3-mini-4k-instruct |
08 | third-intellect/Phi-3-mini-4k-instruct-orca-math-word-problems-200k-model-16bit · Text Generation · made with unsloth | Conversational math problems · Orca-Math problems dataset · unsloth/Phi-3-mini-4k-instruct-bnb-4bit based |
09 | unsloth/Phi-3-medium-4k-instruct · Text Generation · made with unsloth | 2X faster fine-tuning · Notebook (add data) · Quantized bnb-4bit model · microsoft/phi-3-medium-4k-instruct |
10 | vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Text Generation | Chat completion · Recalibrated to fit Llama/Llama-2 model structure · microsoft/phi-3-mini-4k-instruct based |
11 | Sreenington/Phi-3-mini-4k-instruct-AWQ · Text Generation | Uses AutoAWQ for 4-bit quantization · Higher throughput with smaller GPUs · microsoft/phi-3-mini-4k-instruct based |
12 | Skywork/Skywork-Reward-Gemma-2-27B · Text Generation | High-perf reward model · Skywork Reward Dataset · gemma-2-27b-it · Top 3 in RewardBench leaderboard |
13 | TinyLlama/TinyLlama-1.1B-Chat-v1.0 · Text Generation | Pre-train 1.1B-Llama model on 3T tokens · Use UltraChat data set variant · Edge devices · Real-time dialog |
14 | lemon07r/Gemma-2-Ataraxy-9B · Text Generation · Creative Writing | Creative Writing · Top-ranked in Eq-Bench leaderboard · Model merge using SLERP mergekit · google/gemma2-9b fine-tune with merges |
15 | MLP-KTLim/llama-3-Korean-Bllossom-8B · Text Generation · Korean | Korean-English bilingual model · Vocabulary expansion · Human Feedback (DPO) · Bllossom ELO model has SOTA score on LogicKor for <10B models · meta/llama-3 based |
16 | weblab-GENIAC/Tanuki-8B-dpo-v1.0 · · Text Generation · Japanese | Japanese dialogue · Pre-trained · Supervised fine-tuning · Direct Preference Optimization (DPO) · meta/llama arch |
17 | aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct · Text Generation · Multi-lingual | Fine-tuned with 100K English, 50K ASEAN language pairs · Commercially permissive · High-quality datasets · Ranks top on SEA HELM · llama3-8b-cpt (continued pre-trained) |
18 | aisingapore/llama3-8b-cpt-sea-lionv2-base · Text Generation · Multi-lingual | Pre-trained, instruction-tuned for Southeast Asia (SEA) · Evaluated well on BHASA benchmarks · Not aligned for safety · meta/llama-3-8b-instruct based |
Observed Themes
The 18 models added also help us identify useful themes or trends in community created variants in terms of use cases, tools and processes. This is what we observed:
- Multi-lingual Models Continue to Shine - We added models tailored for Spanish, Japanese, Korean and Southeast Asian languages, with many models scoring well on their respective evaluation leaderboards. This underscores growing demand for conversational tasks that can reflect regional vocabularies and culture effectively.
- Phi-3 Variants Continue to Grow - Microsoft’s Phi-3 family of “small language models” (SLM) outperforms other comparable models in equivalent or adjacent size classes. We are now seeing more variants developed, potentially for scenarios on mobile and edge devices. More on this below.
- Fine-Tuning Tools Have Value - Community-authored variants focus on fine-tuning popular base models, but this is time-intensive and costly. We are now seeing more models “made with Unsloth”, reflecting interest in tools and processes that speed-up fine-tuning with less memory - without sacrificing accuracy. More on this below.
- Meta/Llama Remains Popular - The Meta/Llama family of models continues to influence community creators in different ways. First, as a base model for fine-tuning (e.g., multi-lingual with SEA-lion, tool usage with Groq) and second as a target for optimization (e.g., TinyLlama family for mobile and edge devices). We also see adaptation of other base models (e.g., Llamafied version of Phi-3) to make those models fit a familiar structure for usage.
Model Spotlight: Phi-3 Community Variants
The Hugging Face models hub shows Phi-3 variants are created on a daily basis. In general, the Phi-3 family of models outperforms others in its size class (and adjacent) making it ideal for use cases targeting edge and mobile devices. We added 7 of these models to Azure this month, fine-tuned with Phi-3 and Phi-3.5 base models, but with different objectives. Let’s review these briefly.
1. vonjack/Phi-3-mini-4k-instruct-LLaMAfied |
2. Sreenington/Phi-3-mini-4k-instruct-AWQ |
3. Ba2han/Llama-Phi-3_DoRA · made with unsloth |
4. unsloth/Phi-3.5-mini-instruct · made with unsloth |
5. cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated · made with unsloth |
6. third-intellect/Phi-3-mini-4k-instruct-orca-math-word-problems-200k-model-16bit · made with unsloth |
7. unsloth/Phi-3-medium-4k-instruct · made with unsloth |
The first variant recalibrates the model to fit the Llama2/Llama3 model structure for developer familiarity. The second uses AutoAWQ for 4-bit quantization to get a model that can work on smaller GPUs. The third uses Llama-3 generated examples for fine-tuning, scoring well on two popular benchmarks. Variants 4 and 7 are from Unsloth and showcase their fine-tuning techniques as explained in the next section. Variant 5 shows an example of fine-tuning Phi-3 to “abilterate” information, uncensoring it. Variant 6 showcases Phi-3’s math capabilities, fine-tuned with the popular Orca Math Word problems dataset.
Want to explore to explore Phi-3 capabilities but don’t know where to start? Bookmark the Phi-3 Cookbook from Microsoft and start with the Welcome to the Phi-3 Family page. Then explore the Table Of Contents for links to quick-starts, tutorials and E2E samples. Then, try using one of the fine-tuned variants above to see the difference.
Community Call for Action
1. Help Us Spotlight Your Work!
Have you used Hugging Face models on Azure to build interesting AI applications? Have you published your own fine-tuned variants of popular foundation models? We want to know more. Leave a comment on this blog with links to articles or repositories you’ve authored - we’d love to learn more and amplify our model creator community.
2. Get Started with Hugging Face Models on Azure
New to the Azure AI Model catalog and want to get started using Hugging Face models on Azure? Here are three resources to kickstart your learning journey:
- Azure AI Model Catalog: Explore the Hugging Face Collection
- Azure ML Documentation: Model Catalog and Collections
- Azure ML Sample Notebooks: Explore inference tasks with code
Reminder: We add ~20 models to the Hugging Face Collection on Azure each month and we need your feedback to help make these decisions! Request a model using the 3-step process outlined earlier in the article and tell us more about how you’re using it today!
Continue reading...