N
nitya
When I was first asked to think about what the Future of AI would look like for developers, my response was instinctive. It has to start with model choice! Today, I want to dive into this topic in more depth. This is the first of a multi-part series where I hope to take you from catalog to code to cloud, as we build intelligent applications on Azure AI.
The Paradox of Choice
Language models are at the heart of generative AI applications. Our choice of model has consequences in the quality of our application responses and the costs of our solution development. Our decision can help us expand our reach to the edge (with small language models) or improve our precision for specialized applications (with fine-tuning). Choices make the difference between a prototype that has promise and a product that has scale. But making a model choice can be challenging.
The 2023 paper “Harnessing the Power Of LLMs in Practice: A Survey on ChatGPT and Beyond”) charted a fairly sparse LLM landscape with just a handful of model choices (like GPT-4, Llama and Claude) as shown in the figure below. This made it easy for us to get productive quickly with model features and APIs and select the best fit our scenario requirements.
Fast-forward to 2024 and we have 1M+ community-created variants on Hugging Face and a rapidly growing ecosystem of foundation models from providers like Cohere, Mistral, AI21 Labs, Jais, Nixtla, Google, and Microsoft. We have Small Language Models (SLM) extending AI app use to the edge, and Specialized Models (domain-specific) with capabilities ranging from multi-lingual (Jais) to time-series forecasting (Nixtla), healthcare, and more.
This is creating a paradox of choice where the abundance of options makes us afraid to commit to one for fear we may miss out on a better choice for our needs. How do we overcome our analysis paralysis? In this blog post, we’ll explore a 3-part solution that can help mitigate these challenges:
Every generative AI application design architecture starts with the basic question: What models should I use to bring my scenario to life? As developers looking at the end-to-end application lifecycle, our model selection needs to three things into accounts:
The problem for us today is that the model ecosystem is fragmented. Foundation models are published to provider-hosted sites along with their docs & samples, while community-created variants are published to model hubs like Hugging Face. Developers get differing levels of information from each source. This means that the burden of discovery falls on the Developer, requiring them to visit different model playgrounds, use different SDKs, and figure out the right metrics to compare options before choosing one.
This gives rise to decision fatigue where the ability to take a decision is hampered by the fear there may be a better option that is unknown to us. What we need is a structured process for model selection that helps filter out irrelevant options until we get a manageable subset that can be evaluated for specific application needs. A comprehensive platform, like Azure AI Studio (shown below), provides tools and support required to make the process seamless. Let’s dive in!
Let’s revisit the 3-part solution we talked about before and understand the question we are trying to answer at each step, and the challenge we face.
This is where having a comprehensive platform to streamline end-to-end workflow helps! The Azure AI platform offers a one-stop shop for going from model selection to managed solution, in a unified manner with rich tooling, turnkey services, and easy integrations. For our model selection needs, it provides three key features:
These capabilities map directly to our needs for discovery, assessment & assignment
Discovery: "There's a model For that!"
Assessment: "There's a metric for that!"
Alignment: "There's a mechanism for that!"
In our catalog to code to cloud journey, we'll focus on the discovery process today and revisit the assessment and assignment steps in future posts in this series. The illustrated guide below gives you the big picture for model selection, and highlights the steps we can use for shortlisting catalog options to get a manageable subset, in tiles 3-6. Click here to view a hi-res (downloadable) version of the image.
Start by having developers ask the right questions to reduce the catalog choices to a more manageable 1-3 options for assessment.
The Azure AI model catalog allows us to filter models by various criteria – including collection (provider), license (usage), deployment (payment options), task (taxonomy) and keyword (search). Results are laid out as model cards that provide more details like model weights, relevant samples, and any assessments or datasets used for fine-tuning that model (in the case of variants). Deploy a model from the card to get an active endpoint and a playground that you can use for code-first and low-code validation respectively.
Want to do a more rigorous assessment? We'll explore that in our next post, looking at using Azure AI model benchmarks and the Azure AI model inference API, with a practical application scenario. For now, check out this Microsoft Mechanics blog post on how to choose the right models for your apps or watch their 8-minute video walkthrough of the process linked below.
Continue reading...
| Want to learn more about model choice and related Azure AI platform tools & capabilities?
|
The Paradox of Choice
Language models are at the heart of generative AI applications. Our choice of model has consequences in the quality of our application responses and the costs of our solution development. Our decision can help us expand our reach to the edge (with small language models) or improve our precision for specialized applications (with fine-tuning). Choices make the difference between a prototype that has promise and a product that has scale. But making a model choice can be challenging.
The 2023 paper “Harnessing the Power Of LLMs in Practice: A Survey on ChatGPT and Beyond”) charted a fairly sparse LLM landscape with just a handful of model choices (like GPT-4, Llama and Claude) as shown in the figure below. This made it easy for us to get productive quickly with model features and APIs and select the best fit our scenario requirements.
Fast-forward to 2024 and we have 1M+ community-created variants on Hugging Face and a rapidly growing ecosystem of foundation models from providers like Cohere, Mistral, AI21 Labs, Jais, Nixtla, Google, and Microsoft. We have Small Language Models (SLM) extending AI app use to the edge, and Specialized Models (domain-specific) with capabilities ranging from multi-lingual (Jais) to time-series forecasting (Nixtla), healthcare, and more.
This is creating a paradox of choice where the abundance of options makes us afraid to commit to one for fear we may miss out on a better choice for our needs. How do we overcome our analysis paralysis? In this blog post, we’ll explore a 3-part solution that can help mitigate these challenges:
- Discovery: Avoid decision fatigue. Use a structured process to shortlist a selection.
- Assessment: Find the best fit. Use the right tools and metrics to assess the shortlist.
- Development: Make switches easy. Use model-agnostic APIs when coding the app.
What Developers Need
Every generative AI application design architecture starts with the basic question: What models should I use to bring my scenario to life? As developers looking at the end-to-end application lifecycle, our model selection needs to three things into accounts:
- Ideation - can I prototype my application scenario with this model?
- Augmentation - can I optimize the model to improve response quality and safety?
- Operationalization - can I scale usage of the model in production deployments?
The problem for us today is that the model ecosystem is fragmented. Foundation models are published to provider-hosted sites along with their docs & samples, while community-created variants are published to model hubs like Hugging Face. Developers get differing levels of information from each source. This means that the burden of discovery falls on the Developer, requiring them to visit different model playgrounds, use different SDKs, and figure out the right metrics to compare options before choosing one.
This gives rise to decision fatigue where the ability to take a decision is hampered by the fear there may be a better option that is unknown to us. What we need is a structured process for model selection that helps filter out irrelevant options until we get a manageable subset that can be evaluated for specific application needs. A comprehensive platform, like Azure AI Studio (shown below), provides tools and support required to make the process seamless. Let’s dive in!
A Structured Process
Let’s revisit the 3-part solution we talked about before and understand the question we are trying to answer at each step, and the challenge we face.
- Discovery is about finding all the possible options to choose from. This requires us first to know that a specific option exists in this growing ecosystem, then trust that it meets privacy, security and safety standards for enterprise use.
- Assessment is about filtering the choices until there is a manageable shortlist to evaluate more carefully. This requires asking the right questions, preferably in the right order, to eliminate unacceptable choices. Identifying the right criteria for comparing the remaining model choices will help zoom in to make a final decision.
- Assignment is about fitting that model into the application architecture for the scenario. By decoupling the model selection from the task-based application, Developers can swap models without having to re-write code. Then, the model should be tested, allowing for iteration to go from initial prompt to functional prototype.
A Seamless Platform
This is where having a comprehensive platform to streamline end-to-end workflow helps! The Azure AI platform offers a one-stop shop for going from model selection to managed solution, in a unified manner with rich tooling, turnkey services, and easy integrations. For our model selection needs, it provides three key features:
- Azure AI model catalog – with 1.7K+ enterprise-ready models from trusted partners
- Azure AI model benchmarks – with metrics & dataset filters to compare model choices
- Azure AI model inference API – with a task-specific, model-agnostic API to code against
These capabilities map directly to our needs for discovery, assessment & assignment
Discovery: "There's a model For that!"
Assessment: "There's a metric for that!"
The Azure AI model benchmarks help you assess model performance on quality metrics like groundedness, coherence, fluency and similarity across a curated list of foundation LLM and SLM models. Use accuracy scores at model and dataset levels to compare options, picking the criteria that is right for your scenario. Filter views by task or other criteria, for more effective comparisons. And, you can potentially write your own custom evaluators to score models for specific criteria. Want a way to pick the best fit from a shortlisted selection of models? There's a metric for that! |
Alignment: "There's a mechanism for that!"
The Model Selection Process on Azure
In our catalog to code to cloud journey, we'll focus on the discovery process today and revisit the assessment and assignment steps in future posts in this series. The illustrated guide below gives you the big picture for model selection, and highlights the steps we can use for shortlisting catalog options to get a manageable subset, in tiles 3-6. Click here to view a hi-res (downloadable) version of the image.
Start by having developers ask the right questions to reduce the catalog choices to a more manageable 1-3 options for assessment.
- Task: What is the application’s main inference task? Selecting the right task can drastically cut down model options, making it the perfect first step in filtering. For example, there are many options for “text generation” or “question answering” but fewer options for “image generation”. Starting with this filter ensures you are focusing on the right subset to start with.
- Specialization. Does the application require a higher precision for a specialized domain? Does a specialized model exist, or can one be trained with your data? The first requires searching by keywords or task taxonomy – while the second checks to see if fine-tuning models exist for the targeted task, or if there exists a community-created variant that meets your needs.
- Performance. Does the application require a real-time response (e.g., mobile or edge devices) or is round-trip latency to cloud an acceptable tradeoff for other constraints? This can effectively become a choice between SLM and LLM options.
- Constraints. Does the solution development have associated cost or resource constraints that impact the decision? Does serverless API deployment (pay-per-token) work better than managed compute (pay-per-VM) for current needs?
The Azure AI model catalog allows us to filter models by various criteria – including collection (provider), license (usage), deployment (payment options), task (taxonomy) and keyword (search). Results are laid out as model cards that provide more details like model weights, relevant samples, and any assessments or datasets used for fine-tuning that model (in the case of variants). Deploy a model from the card to get an active endpoint and a playground that you can use for code-first and low-code validation respectively.
Want to do a more rigorous assessment? We'll explore that in our next post, looking at using Azure AI model benchmarks and the Azure AI model inference API, with a practical application scenario. For now, check out this Microsoft Mechanics blog post on how to choose the right models for your apps or watch their 8-minute video walkthrough of the process linked below.
| Want to learn more about model choice and related Azure AI platform tools & capabilities?
|
Continue reading...