Building high scale RAG applications with Microsoft Fabric Eventhouse

  • Thread starter Thread starter Denise_Schlesinger
  • Start date Start date
D

Denise_Schlesinger

Introduction​


In this article I will guide you on how to build a Generative AI application in Microsoft Fabric.
This guide will walk you through implementing a RAG (Retrieval Augmented Generation) system in Microsoft Fabric using Azure OpenAI and Microsoft Fabric Eventhouse as your vector store.



Why MS Fabric Eventhouse?​


Fabric Eventhouse is built using the Kusto Engine that delivers top-notch performance for similarity search at high scale.

If you are looking to build a RAG application with a large number of embeddings vectors, look no more, using MS Fabric you can leverage the processing power for building the Vector Database and the high performant engine powering Fabric Eventhouse DB.



If you want to know more about using Fabric Eventhouse as a Vector store here are some links.

Azure Data Explorer for Vector Similarity Search

Optimizing Vector Similarity Search on Azure Data Explorer – Performance Update

Optimizing Vector Similarity Searches at Scale



What is RAG - Retrieval Augmented Generation?​


Large Language Models (LLMs) excel in creating text that resembles human writing.
Initially, LLMs are equipped with a broad spectrum of knowledge from extensive datasets used for their training. This grants them flexibility but may not provide the specialized focus or knowledge necessary in certain topics.

Retrieval Augmented Generation (RAG) is a technique that improves the pertinence and precision of LLMs by incorporating real-time, relevant information into their responses. With RAG, an LLM is boosted by a search system that sifts through unstructured text to find information, which then refines the LLM's replies.



What is a Vector Database?​


The Vector Database is a vital component in the retrieval process in RAG, facilitating the quick and effective identification of relevant text sections in response to a query, based on how closely they match the search terms.

Vector DBs are data stores optimized for storing and processing vector data. Vector data can refer to data types such as geometric shapes, spatial data, or more abstract high-dimensional data used in machine learning applications, such as embeddings.

These databases are designed to efficiently handle operations such as similarity search, nearest neighbour search, and other operations that are common when dealing with high-dimensional vector spaces.

For example, in machine learning, it's common to convert text, images, or other complex data into high-dimensional vectors using models like word embeddings, image embeddings, etc. To efficiently search and compare these vectors, a vector database or vector store with specialized indexing and search algorithms would be used.

In our case we will use Azure OpenAI Ada Embeddings model to create embeddings, which are vector representations of the text we are indexing and storing in Microsoft Fabric Eventhouse DB.



The code​


The code can be found here.

We will use the Moby Dick book from the Gutenberg project in PDF format as our knowledge base.

We will read the PDF file, cut the text into chunks of 1000 characters and calculate the embeddings for each chunk, then we will store the text and the embeddings in our Vector Database (Fabric Eventhouse)

We will then ask questions and get answers from our Vector DB and send the question and answers to Azure OpenAI GPT4 to get a response in natural language.



Processing the files and indexing the embeddings

Denise_Schlesinger_0-1723557533819.png



We will do this once – only to create the embeddings and then save them into our Vector Database – Fabric Eventhouse



  • Read files from Fabric Lakehouse
  • Create embeddings from the text using Azure OpenAI ada Embeddings model
  • Save the text and embeddings in our Fabric Eventhouse DB

RAG - Getting answers

Denise_Schlesinger_1-1723557533821.png



Every time we want to search for answers from our knowledge base, we will:

  • Create the embeddings for the question and search our Fabric Eventhouse for the answers, using Similarity search
  • Combining the question and the retrieved answers from our Vector Database, we will call Azure OpenAI GPT4 model to get “natural language” answer.





Prerequisites​


To follow this guide, you will need to ensure that you have access to the following services and have the necessary credentials and keys set up.

  • Microsoft Fabric.
  • Azure OpenAI Studio to manage and deploy OpenAI models.



Setup​


Create a Fabric Workspace

Denise_Schlesinger_1-1723559126859.png







Create a Lakehouse

Denise_Schlesinger_2-1723559163845.png



Denise_Schlesinger_4-1723557533827.png



Upload the moby dick pdf file

Denise_Schlesinger_5-1723557533829.png



Denise_Schlesinger_6-1723557533831.png



Denise_Schlesinger_7-1723557533834.png



Create an Eventhouse DB called “GenAI_eventhouse”

Denise_Schlesinger_8-1723557533836.png



Denise_Schlesinger_9-1723557533840.png



Click on the DB name and then “Explore your data” on the top-right side

Denise_Schlesinger_10-1723557533844.png



Create the “bookEmbeddings” table

Paste the following command and run it







.create table bookEmbeddings (document_name:string, content:string, embedding:dynamic)





Denise_Schlesinger_11-1723557533847.png



Denise_Schlesinger_12-1723557533851.png



Import our notebook

Denise_Schlesinger_13-1723557533855.png



Denise_Schlesinger_14-1723557533859.png



Grab your Azure openAI endpoint and secret key and paste it in the notebook, replace your models deployment names if needed.

Denise_Schlesinger_15-1723557533862.png



Denise_Schlesinger_16-1723557533866.png



Get the Eventhouse URI and paste it as “KUSTO_URI” in the notebook

Denise_Schlesinger_17-1723557533870.png



Connect the notebook to the Lakehouse

Denise_Schlesinger_18-1723557533873.png



Denise_Schlesinger_19-1723557533874.png





Denise_Schlesinger_20-1723557533875.png



Let’s run our notebook

Denise_Schlesinger_21-1723557533876.png



This will install all the python libraries we need





%pip install openai==1.12.0 azure-kusto-data langchain tenacity langchain-openai pypdf





Run cell 2 after configuring the environment variables for:





Code:
OPENAI_GPT4_DEPLOYMENT_NAME="gpt-4"
OPENAI_DEPLOYMENT_ENDPOINT="<your-azure openai endpoint>"
OPENAI_API_KEY="<your-azure openai api key>"
OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = "text-embedding-ada-002"
KUSTO_URI = "<your-eventhouse cluster-uri>"





Denise_Schlesinger_22-1723557533880.png

Run cell 3

Here we create an Azure OpenAI client and define a function to calculate embeddings





Code:
client = AzureOpenAI(
        azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
        api_key=OPENAI_API_KEY,
        api_version="2023-09-01-preview"
    )

#we use the tenacity library to create delays and retries when calling openAI embeddings to avoid hitting throttling limits
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))

def generate_embeddings(text):
    # replace newlines, which can negatively affect performance.
    txt = text.replace("\n", " ")
    return client.embeddings.create(input = [txt], model=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME).data[0].embedding





Run cell 4

Read the file, divide it into 1000 chars chunks





Code:
# splitting into 1000 char long chunks with 30 char overlap
# split ["\n\n", "\n", " ", ""]
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=30,
)

documentName = "moby dick book"
#Copy File API path
fileName = "/lakehouse/default/Files/moby dick.pdf"
loader = PyPDFLoader(fileName)
pages = loader.load_and_split(text_splitter=splitter)
print("Number of pages: ", len(pages))





Run cell 5

Save the text chunks to a pandas dataframe





Code:
#save all the pages into a pandas dataframe
import pandas as pd
df = pd.DataFrame(columns=['document_name', 'content', 'embedding'])
for page in pages:
    df.loc[len(df.index)] = [documentName, page.page_content, ""]  
df.head()





Run cell 6

Calculate embeddings





Code:
# calculate the embeddings using openAI ada
df["embedding"] = df.content.apply(lambda x: generate_embeddings(x))
print(df.head(2))





Run cell 7

Write the data to MS Fabric Eventhouse





Code:
df_sp = spark.createDataFrame(df)
df_sp.write.\
format("com.microsoft.kusto.spark.synapse.datasource").\
option("kustoCluster",KUSTO_URI).\
option("kustoDatabase",KUSTO_DATABASE).\
option("kustoTable", KUSTO_TABLE).\
option("accessToken", accessToken ).\
mode("Append").save()





Let’s check the data was saved to our Vector Database

Go to the Eventhouse and run this query





Code:
bookEmbeddings
| take 10





Denise_Schlesinger_23-1723557533886.png

Go back to the notebook and run the rest of the cells

Creates a function to call GPT4 for a NL answer





Code:
def call_openAI(text):
    response = client.chat.completions.create(
        model=OPENAI_GPT4_DEPLOYMENT_NAME,
        messages = text,
        temperature=0
    )
    return response.choices[0].message.content





Creates a function to retrieve answers using embeddings with similarity search





Code:
def get_answer_from_eventhouse(question, nr_of_answers=1):
        searchedEmbedding = generate_embeddings(question)
        kusto_query = KUSTO_TABLE + " | extend similarity = series_cosine_similarity(dynamic("+str(searchedEmbedding)+"), embedding) | top " + str(nr_of_answers) + " by similarity desc "
        kustoDf  = spark.read\
        .format("com.microsoft.kusto.spark.synapse.datasource")\
        .option("kustoCluster",KUSTO_URI)\
        .option("kustoDatabase",KUSTO_DATABASE)\
        .option("accessToken", accessToken)\
        .option("kustoQuery", kusto_query).load()
        return kustoDf





Retrieves 2 answers from Eventhouse





Code:
nr_of_answers = 2
question = "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?"
answers_df = get_answer_from_eventhouse(question, nr_of_answers)





Concatenates the answers





Code:
answer = ""
for row in answers_df.rdd.toLocalIterator():
    answer = answer + " " + row['content']





Creates a prompt for GPT4 with the question and the 2 answers





Code:
prompt = 'Question: {}'.format(question) + '\n' + 'Information: {}'.format(answer)
# prepare prompt
messages = [{"role": "system", "content": "You are a HELPFUL assistant answering users questions. Answer the question using the provided information and do not add anything else."},
            {"role": "user", "content": prompt}]
result = call_openAI(messages)
display(result)







That’s it, you have built your very first RAG app using MS Fabric

All the code can be found here.






Thanks

Denise

Continue reading...
 
Back
Top