F
fsunavala-msft
In April 2024, we proudly announced our partnership with Cohere, allowing customers to seamlessly leverage Cohere models via the Azure AI Studio Model Catalog, as part of the Models as a Service (MaaS) offering. At Build 2024, Azure AI Search launched support for Binary Vectors. In this blog, we are excited to continue from our previous discussion on int8 embeddings and highlight two powerful new capabilities: utilizing Cohere Binary Embeddings in Azure AI Search for optimized storage and search, and employing the Cohere Command R+ model as a Large Language Model (LLM) for Retrieval-Augmented Generation (RAG).
Binary vector embeddings use a single bit per dimension, making them much more compact than vectors using floats or int8, while still yielding surprisingly good quality given the size reduction. Cohere's binary embeddings offer substantial efficiency, enabling you to store and search vast datasets more cost-effectively. This capability can achieve significant memory reduction, allowing more vectors to fit within Azure AI Search units or enabling the use of lower SKUs, thus improving cost efficiency and supporting larger indexes.
With int8 and binary embeddings, customers can achieve up to a 32x reduction in vector size under optimal conditions, translating to improved cost efficiency and the ability to handle larger datasets. Read the full announcement from Cohere here: Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets
The Cohere Command R+ model is a state-of-the-art language model that can be used for Retrieval-Augmented Generation (RAG). This approach combines retrieval of relevant documents with the generation capabilities of the model, resulting in more accurate and contextually relevant responses.
Here's how you can use Cohere Binary Embeddings and the Command R model via Azure AI Studio:
First, install the necessary libraries, including the Azure Search Python SDK and Cohere Python SDK.
Set up your credentials for both Cohere and Azure AI Search. For this walkthrough, we'll use Cohere Deployed Models in Azure AI Studio. However, you can also use the Cohere API directly.
Use the Cohere Embed API via Azure AI Studio to generate binary and int8 embeddings for your documents.
Create an Azure AI Search index to store the embeddings. Note, that Azure AI Search only supports unsigned binary at this time.
Index the documents along with their embeddings into Azure AI Search.
Use the Azure AI Search client to perform a vector search using the generated embeddings.
Use the Cohere Command R+ model to generate a response based on the retrieved documents.
Find the full notebook with all the code and examples here.
By integrating Cohere Binary Embeddings and the Command R/R+ model into your Azure AI workflow, you can significantly enhance the performance and scalability of your AI applications, providing faster, more efficient, and contextually relevant results.
Continue reading...
Cohere Binary Embeddings via Azure AI Studio
Binary vector embeddings use a single bit per dimension, making them much more compact than vectors using floats or int8, while still yielding surprisingly good quality given the size reduction. Cohere's binary embeddings offer substantial efficiency, enabling you to store and search vast datasets more cost-effectively. This capability can achieve significant memory reduction, allowing more vectors to fit within Azure AI Search units or enabling the use of lower SKUs, thus improving cost efficiency and supporting larger indexes.
"Cohere's binary embeddings available in Azure AI Search provide a powerful combination of memory efficiency and search quality, ideal for advanced AI applications." - Nils Reimers, Cohere's Director of Machine Learning.
With int8 and binary embeddings, customers can achieve up to a 32x reduction in vector size under optimal conditions, translating to improved cost efficiency and the ability to handle larger datasets. Read the full announcement from Cohere here: Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets
Cohere Command R+ Model for RAG
The Cohere Command R+ model is a state-of-the-art language model that can be used for Retrieval-Augmented Generation (RAG). This approach combines retrieval of relevant documents with the generation capabilities of the model, resulting in more accurate and contextually relevant responses.
Step-by-Step Guide
Here's how you can use Cohere Binary Embeddings and the Command R model via Azure AI Studio:
Install Required Libraries
First, install the necessary libraries, including the Azure Search Python SDK and Cohere Python SDK.
Code:
pip install --pre azure-search-documents
pip install azure-identity cohere python-dotenv
Set Up Cohere and Azure AI Search Credentials
Set up your credentials for both Cohere and Azure AI Search. For this walkthrough, we'll use Cohere Deployed Models in Azure AI Studio. However, you can also use the Cohere API directly.
Code:
import os
import cohere
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import SearchIndex, SearchField, SimpleField, VectorSearch, VectorSearchProfile, HnswAlgorithmConfiguration, HnswParameters, VectorEncodingFormat, VectorSearchAlgorithmKind, VectorSearchAlgorithmMetric, AzureMachineLearningVectorizer, AzureMachineLearningParameters
from dotenv import load_dotenv
load_dotenv()
# Azure AI Studio Cohere Configuration
AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY")
AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT")
AZURE_AI_STUDIO_COHERE_COMMAND_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_KEY")
AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT")
# Index Names
INT8_INDEX_NAME = "cohere-embed-v3-int8"
BINARY_INDEX_NAME = "cohere-embed-v3-binary"
# Azure Search Service Configuration
SEARCH_SERVICE_API_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
# Create a Cohere client using the AZURE_AI_STUDIO_COHERE_API_KEY and AZURE_AI_STUDIO_COHERE_ENDPOINT from Azure AI Studio
cohere_azure_client = cohere.Client(
base_url=f"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1",
api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY
)
Generate Embeddings using Azure AI Studio
Use the Cohere Embed API via Azure AI Studio to generate binary and int8 embeddings for your documents.
Code:
def generate_embeddings(texts, input_type="search_document", embedding_type="ubinary"):
model = "embed-english-v3.0"
texts = [texts] if isinstance(texts, str) else texts
response = cohere_azure_client.embed(
texts=texts,
model=model,
input_type=input_type,
embedding_types=[embedding_type],
)
return [embedding for embedding in getattr(response.embeddings, embedding_type)]
# Example usage
documents = ["Alan Turing was a pioneering computer scientist.", "Marie Curie was a groundbreaking physicist and chemist."]
binary_embeddings = generate_embeddings(documents, embedding_type="ubinary")
int8_embeddings = generate_embeddings(documents, embedding_type="int8")
Create an Azure AI Search Index
Create an Azure AI Search index to store the embeddings. Note, that Azure AI Search only supports unsigned binary at this time.
Code:
def create_or_update_index(client, index_name, vector_field_type, scoring_uri, authentication_key, model_name):
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchField(name="text", type=SearchFieldDataType.String, searchable=True),
SearchField(
name="embedding",
type=vector_field_type,
vector_search_dimensions=1024,
vector_search_profile_name="my-vector-config",
hidden=False,
stored=True,
vector_encoding_format=(
VectorEncodingFormat.PACKED_BIT if vector_field_type == "Collection(Edm.Byte)" else None
),
),
]
vector_search = VectorSearch(
profiles=[VectorSearchProfile(name="my-vector-config", algorithm_configuration_name="my-hnsw")],
algorithms=[HnswAlgorithmConfiguration(name="my-hnsw", kind=VectorSearchAlgorithmKind.HNSW, parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE if vector_field_type == "Collection(Edm.SByte)" else VectorSearchAlgorithmMetric.HAMMING))]
)
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)
client.create_or_update_index(index=index)
# Example usage
search_index_client = SearchIndexClient(endpoint=search_service_endpoint, credential=credential)
create_or_update_index(search_index_client, "binary-embedding-index", "Collection(Edm.Byte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0")
create_or_update_index(search_index_client, "int8-embedding-index", "Collection(Edm.SByte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0")
Index Documents and Embeddings
Index the documents along with their embeddings into Azure AI Search.
Code:
def index_documents(search_client, documents, embeddings):
documents_to_index = [{"id": str(idx), "text": doc, "embedding": emb} for idx, (doc, emb) in enumerate(zip(documents, embeddings))]
search_client.upload_documents(documents=documents_to_index)
# Example usage
search_client_binary = SearchClient(endpoint=search_service_endpoint, index_name="binary-embedding-index", credential=credential)
search_client_int8 = SearchClient(endpoint=search_service_endpoint, index_name="int8-embedding-index", credential=credential)
index_documents(search_client_binary, documents, binary_embeddings)
index_documents(search_client_int8, documents, int8_embeddings)
Perform a Vector Search
Use the Azure AI Search client to perform a vector search using the generated embeddings.
Code:
def perform_vector_search(search_client, query, embedding_type="ubinary"):
query_embeddings = generate_embeddings(query, input_type="search_query", embedding_type=embedding_type)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="embedding")
results = search_client.search(search_text=None, vector_queries=[vector_query])
for result in results:
print(f"Text: {result['text']}")
print(f"Score: {result['@search.score']}\n")
# Example usage
perform_vector_search(search_client_binary, "pioneers in computer science", embedding_type="ubinary")
perform_vector_search(search_client_int8, "pioneers in computer science", embedding_type="int8")
Code:
Int8 Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.6225287
Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.5917698
Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.
Score: 0.5746157
Binary Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.002610966
Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.0024509805
Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.
Score: 0.0023980816
Ground the Results to Cohere Command R+ for RAG
Use the Cohere Command R+ model to generate a response based on the retrieved documents.
Code:
# Create a Cohere client for Command R+
co_chat = cohere.Client(
base_url=f"{AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT}/v1", api_key=AZURE_AI_STUDIO_COHERE_COMMAND_KEY
)
# Extract the documents from the search results
documents_binary = [{"text": result["text"]} for result in results_binary]
# Ground the documents from the "binary" index
chat_response_binary = co_chat.chat(
message=query, documents=documents_binary, max_tokens=100
)
print(chat_response_binary.text)
Code:
Binary Results:
There are many foundational figures who have made significant contributions to the field of computer science. Here are some of the most notable individuals:
1. Alan Turing: Often considered the "father of computer science," Alan Turing was a British mathematician and computer scientist who made groundbreaking contributions to computing, cryptography, and artificial intelligence. He is widely known for his work on the Turing machine, a theoretical device that served as a model for modern computers, and for his crucial role in breaking German Enigma codes during World War II.
2. Albert Einstein: Known for his theory of relativity and contributions to quantum mechanics, Albert Einstein was a German-born physicist whose work had a profound impact on the development of modern physics. His famous equation, E=mc^2, has become one of the most well-known scientific formulas in history.
3. Isaac Newton: An English mathematician, physicist, and astronomer, Isaac Newton is widely recognized for his laws of motion and universal gravitation. His work laid the foundation for classical mechanics and significantly advanced the study of optics and calculus.
Full Notebook
Find the full notebook with all the code and examples here.
Getting Started
Azure AI Search Documentation:
- Learn more about setting up and using Azure AI Search.
- Dive into the specifics of Binary Vectors in Azure AI Search.
Cohere Documentation:
- Explore how to integrate Cohere models via Cohere’s API.
- Learn how to install and use the Cohere Python SDK and how to deploy the Cohere Embed Model-As-A-Service with Azure AI Studio.
Additional Resources:
- Learn more about indexing binary vector types.
- Explore the latest features of Azure AI Search.
- Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file.
By integrating Cohere Binary Embeddings and the Command R/R+ model into your Azure AI workflow, you can significantly enhance the performance and scalability of your AI applications, providing faster, more efficient, and contextually relevant results.
Continue reading...