Elevate search operations and streamline AI development with Cohere Rerank on Azure AI

shubhiraj99 · Jul 25, 2024

At Microsoft Build, we announced the availability of Command R models on Azure AI. Today, we are very excited to announce the addition of two new models from Cohere:

Cohere Rerank 3 – English

Cohere Rerank 3 – Multilingual

Cohere Rerank 3 is considered a leading AI model for semantic reranking in search systems.  These models are available as serverless APIs with pay-as-you-go token-based billing. Accessing Cohere’s enterprise-ready language models on Azure AI’s robust infrastructure enables businesses to seamlessly, reliably, and safely incorporate cutting-edge semantic search technology into their applications.

This integration allows users to leverage the flexibility and scalability of Azure AI, combined with Cohere's language models, to deliver superior search results in production. Using Azure's AI model catalog, and with just a few lines of code, developers can implement Rerank 3 to enhance their existing search systems.  

According to Cohere, Rerank 3 offers state-of-the-art capabilities for enterprise search, including:

4k context length to significantly improve search quality for longer documents
Ability to search over multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables
Multilingual coverage of 100+ languages
Improved latency and lower total cost of ownership (TCO)

How Customers Can Enhance Search with Rerank

Rerank models (compatible with 100+ languages) serve two primary purposes,

It can be added to existing search systems after an initial dense retrieval stage to improve the relevancy of results, or

it can be added to retrieval-augmented generation (RAG) systems to increase the relevancy of documents being passed to the generative model (and therefore reduce operating costs).

Atomicwork, a digital workplace experience platform and longtime Azure customer, has significantly enhanced its IT service management platform with Rerank 3. By integrating the model into their AI digital assistant, Atom AI, Atomicwork saw over 20% in improved search accuracy and relevance, providing faster, more precise answers to complex IT support queries. This integration has streamlined IT operations and boosted productivity across the enterprise.

"The driving force behind Atomicwork's digital workplace experience solution is Cohere’s Rerank model and Azure AI Studio, which powers Atom AI, our digital assistant, with the precision and performance required to deliver real-world results. This strategic collaboration underscores our commitment to providing businesses with advanced, secure, and reliable enterprise AI capabilities," said Vijay Rayapati, CEO of Atomicwork."

TD Bank Group, one of the largest banks in North America, recently signed an agreement with Cohere to explore its full suite of large language models (LLMs), including Rerank 3.

"At TD, we've seen the transformative potential of AI to deliver more personalized and intuitive experiences for our customers, colleagues and communities," said Kirsti Racine, VP, AI Technology Lead, TD. "We're excited to be working alongside Cohere to explore how its language models perform on Microsoft Azure to help support our innovation journey at the Bank."

Enterprises of any size and in any industry can leverage Rerank to enhance their search capabilities across countless scenarios, including:

Legacy search improvement: Improve results from legacy lexical or semantic tools

Customer support search: Enable self-serve across complex customer support docs

Multilingual search: Understand meaning and relevance across over 100 languages

RAG: Retrieve most relevant answers across heterogenous enterprise data sources

Command R+, Cohere’s flagship generative model which is also available on Azure AI, is purpose-built to work well with Rerank within a RAG system. Together, they are capable of serving the most demanding enterprise workloads in production.

Boosting Search Quality for 100+ Languages with a Single Line of Code

Established keyword-based search systems are deeply ingrained within a company’s information architecture, and switching to a vector database for embedding-based search is often impractical. This is where Cohere Rerank can help, providing a seamless bridge between traditional keyword-based search and the power of semantic search.

Accessible through Azure AI Studio’s Cohere Rerank endpoint, our model computes a relevance score for a set of text documents compared to a given user query. This approach consistently yields superior search results, especially for complex and domain-specific queries, compared to traditional embedding-based semantic search.

Why Azure AI for Cohere Rerank models?

Cohere Rerank models are now available as serverless APIs through Models as a Service (MaaS), which is now Generally Available. This enables enterprise-scale workloads with ease.

Network Isolation for Inferencing: Protect your data from public network access.

Expanded Regional Availability: Access from multiple regions.

Data Privacy and Security: Robust measures to ensure data protection.

Quick Endpoint Provisioning: Set up a rerank endpoint in AI Studio in seconds.

Azure AI ensures seamless integration, enhanced security, and rapid deployment for your AI needs.

How to deploy Cohere Rerank 3 models on Azure AI studio?

Prerequisites: 

If you don’t have an Azure subscription, get one here: Pay As You Go—Buy Directly | Microsoft Azure

Familiarize yourself with Azure AI Model Catalog

Create an Azure  AI Studio hub and project. Make sure you pick East US, West US3, South Central US, West US, North Central US, East US 2 or Sweden Central as the Azure region for the hub.

Create a deployment to obtain the inference API and key: 

Open the model card in the model catalog on Azure AI Studio.

Click on Deploy and select the Pay-as-you-go option.

Subscribe to the Marketplace offer and deploy. You can also review the API pricing at this step.

You should land on the deployment page that shows you the API and key in less than a minute.

These steps are outlined in detail in the product documentation.

Please check some samples to get started – LangChain, Web Requests, Cohere Client

FAQ

What does it cost to use Cohere Rerank on Azure?

You are billed based on the number of prompt and completions tokens. You can review the pricing on the Cohere offer in the Azure Marketplace offer details tab when deploying the model. You can also find the pricing on the Azure Marketplace.

Are the Cohere models' region specific on Azure?

Command R/R+/Embed/Rerank models are available as serverless API endpoints.

These endpoints can be created in Azure AI Studio projects or Azure Machine Learning workspaces. Cross-regional support for these endpoints is available in the following regions in the US: East US, East US 2 , West US3, South Central US, West US, North Central US, Sweden Central

Do I require GPU capacity quota in my Azure subscription to deploy Cohere Rerank models?

Cohere mdodels are available through MaaS as serverless API endpoints. You don’t require GPU capacity quota in your Azure subscription to deploy these models.

Cohere models are listed on the Azure Marketplace. Can I purchase and use these models directly from Azure Marketplace?

Azure Marketplace is our foundation for commercial transactions for models built on or built for Azure. The Azure Marketplace enables the purchasing and billing of Mistral models. However, model discoverability occurs in both Azure Marketplace and the Azure AI model catalog. Meaning you can search and find Cohere models in both the Azure Marketplace and Azure AI Model Catalog.

If you search for Cohere Rerank 3 in Azure Marketplace, you can subscribe to the offer before being redirected to the Azure AI Model Catalog in Azure AI Studio where you can complete subscribing and can deploy the model.

If you search for Cohere Rerank 3 in the Azure AI Model Catalog, you can subscribe and deploy the model from the Azure AI Model Catalog without starting from the Azure Marketplace. The Azure Marketplace still tracks the underlying commerce flow.

Given that Cohere models are billed through the Azure Marketplace, does it retire my Azure consumption commitment (aka MACC)?

Yes, Cohere models are “Azure benefit eligible” Marketplace offers, which indicates MACC eligibility. Learn more about MACC here: Take advantage of your Microsoft Azure Consumption Commitment (MACC) benefit - Microsoft marketplace

Is my inference data shared with Cohere?

No, Microsoft does not share the content of any inference request or response data with any model provider.

Microsoft acts as the data processor for prompts and outputs sent to and generated by a model deployed for pay-as-you-go inferencing (MaaS). Microsoft doesn't share these prompts and outputs with the model provider, and Microsoft doesn't use these prompts and outputs to train or improve Microsoft's, the model providers, or any third party's models.  Read more on data, security and privacy for Models-as-a-Service.

Are there rate limits for the Cohere models on Azure?

Cohere models come with 400 K tokens per minute and 1 K requests per minute limit. Reach out to Azure customer support if this doesn’t suffice.

Can I use MaaS models in any Azure subscription types?

Customers can use MaaS models in all Azure subsection types with a valid payment method, except for the CSP (Cloud Solution Provider) program. Free or trial Azure subscriptions are not supported.

Continue reading...

Elevate search operations and streamline AI development with Cohere Rerank on Azure AI

shubhiraj99