GenAI Mastery: Crafting Robust Enterprise Solutions with PromptFlow and LangChain

mrajguru · Jul 13, 2024

In the rapidly evolving landscape of artificial intelligence, generative AI (GenAI) has emerged as a game-changer for enterprises. However, building end-to-end GenAI applications that are robust, observable, and scalable can be challenging. This blog post will guide you through the process of creating enterprise-grade GenAI solutions using PromptFlow and LangChain, with a focus on observability, trackability, model monitoring, debugging, and autoscaling. The purpose of this blog to give you an idea that even if you use LangChain or OpenAI SDK or Llama Index you can still use PromptFlow and AI Studio for enterprise grade GenAI applications.

Understanding Enterprise GenAI Applications

Enterprise GenAI applications are AI-powered solutions that can generate human-like text, images, or other content based on input prompts. These applications need to be:

Reliable
Secure
Scalable

Key considerations include:

Data privacy
Performance at scale
Integration with existing enterprise systems

PromptFlow and LangChain: A Powerful Combination

PromptFlow

A toolkit for building AI applications with large language models (LLMs)
Offers features like prompt management and flow orchestration

LangChain

A framework for developing applications powered by language models
Provides tools for prompt optimization and chaining multiple AI operations

Together, these frameworks offer a robust foundation for enterprise GenAI applications:

PromptFlow excels in managing complex prompt workflows
LangChain provides powerful tools for interacting with LLMs and structuring applications

Building the Application: A Step-by-Step Approach

Define your application requirements and use cases: Defining your application requirements and use cases is a pivotal step in developing a successful Retrieval-Augmented Generation (RAG) system for document processing. Begin by identifying the core objectives of your application, such as the types of documents it will handle, the specific data it needs to extract, and the desired output format. Clearly outline the use cases, such as automated report generation, data extraction for business intelligence, or enhancing customer support through better information retrieval. Detail the functional requirements, including the ability to parse various document formats, the accuracy and speed of the retrieval process, and the integration capabilities with existing systems. Additionally, consider non-functional requirements like scalability, security, and user accessibility. By thoroughly defining these aspects, you create a roadmap that guides the development process, ensuring the final application meets user expectations and delivers tangible value.

Set up your development environment with PromptFlow and LangChain: Setting up your development environment with PromptFlow and LangChain is essential for building an efficient Retrieval-Augmented Generation (RAG) application. Start by ensuring you have a robust development setup, including a compatible operating system, necessary software dependencies, and a version control system like Git. Install PromptFlow, a powerful tool for designing, testing, and deploying prompt-based applications. This tool will streamline your workflow, allowing you to create, test, and optimize prompts with ease. Next, integrate LangChain, a versatile framework designed to facilitate the use of language models in your applications. LangChain provides modules for chaining together various components, such as prompts, retrieval mechanisms, and post-processing steps, enabling you to build complex RAG systems efficiently. Configure your environment to support these tools, ensuring you have the necessary libraries and frameworks installed, and set up a virtual environment to manage dependencies. By meticulously setting up your development environment with PromptFlow and LangChain, you lay a solid foundation for creating a robust, scalable, and efficient RAG application.

Start with a Prompt Flow project.

pf flow init --flow rag-langchain-pf --type chat

As soon as you run this you will able to see a folder with below files.

Design your prompt flow using PromptFlow's visual interface: Designing your prompt flow using PromptFlow's visual interface is a crucial step in developing an intuitive and effective Retrieval-Augmented Generation (RAG) application. Begin by familiarizing yourself with PromptFlow's drag-and-drop interface, which allows you to visually map out the sequence of prompts and actions your application will execute. Start by defining the initial input prompts that will trigger the retrieval of relevant documents. Use the visual interface to connect these prompts to subsequent actions, such as querying your document database or calling external APIs for additional data.

Next, incorporate conditional logic to handle various user inputs and scenarios, ensuring that your prompt flow can adapt dynamically to different contexts. Leverage PromptFlow's built-in modules to integrate language model responses, enabling seamless transitions between retrieving information and generating human-like text. As you design the flow, make use of visual debugging tools to test each step, ensuring that the prompts and actions work together harmoniously. This iterative process allows you to refine and optimize the prompt flow, making it more efficient and responsive to user needs. By taking advantage of PromptFlow's visual interface, you can create a clear, logical, and efficient prompt flow that enhances the overall performance and user experience of your RAG application.

First install the visual studio extension for Prompt Flow.

Once you installed the PromptFlow Extension, you will be able to see the Flow you just created using PF Init. If you open the Flow you will see below

Next you create a custom connection which will store Azure OpenAI/ACS keys and endpoints. Create a file called langchain_pf_connection.yaml. Paste the below details there.

Code:

$schema: https://azuremlschemas.azureedge.net/promptflow/latest/CustomConnection.schema.json
name: langchain_pf_connection
type: custom
configs:
  test_key: test_value
secrets:  # required
  AZURE_OPENAI_ENDPOINT: https://XXXX.openai.azure.com/
  AZURE_OPENAI_GPT_DEPLOYMENT: gpt-4o
  AZURE_OPENAI_API_KEY: XXXX
  ACS_ENDPOINT: https://search-XXXX.search.windows.net
  ACS_KEY: XXXX

Now run the below command to create the custom connection in the terminal.

pf connection create -f langchain_pf_connection.yaml

Once you create the connection next step is the create of the flow. Edit the flow.dag.yaml and paste the below code.

Code:

id: template_chat_flow
name: Template Chat Flow
inputs:
  chat_history:
    type: list
    is_chat_input: false
    is_chat_history: true
  question:
    type: string
    is_chat_input: true
outputs:
  answer:
    type: string
    reference: ${code.output}
    is_chat_output: true
nodes:
- name: code
  type: python
  source:
    type: code
    path: code.py
  inputs:
    chat_history: ${inputs.chat_history}
    input1: ${inputs.question}
    my_conn: langchain_pf_connection
  use_variants: false
node_variants: {}
environment:
  python_requirements_txt: requirements.txt

Implement LangChain components for enhanced LLM interactions:

Implementing LangChain components for enhanced LLM interactions is a key aspect of building a sophisticated Retrieval-Augmented Generation (RAG) application. LangChain offers a modular approach to integrating language models, enabling you to construct complex workflows that leverage the power of large language models (LLMs). Start by identifying the core components you need, such as input processing, retrieval mechanisms, and output generation.

Begin with the input processing component to handle and preprocess user queries. This might involve tokenization, normalization, and contextual understanding to ensure the query is suitable for retrieval. Next, implement the retrieval component, which connects to your document database or API endpoints to fetch relevant information. LangChain provides tools to streamline this process, such as vector stores for efficient similarity searches and retrievers that can interface with various data sources.

Once the relevant documents are retrieved, integrate the LLM component to generate responses. Use LangChain’s chaining capabilities to combine the retrieved information with prompts that guide the LLM in generating coherent and contextually appropriate outputs. You can also implement post-processing steps to refine the output, ensuring it meets the desired accuracy and relevance criteria.

Additionally, consider incorporating LangChain’s memory components to maintain context across interactions, enhancing the continuity and relevance of the responses. By carefully implementing these components, you can create a robust system that leverages the strengths of LLMs to deliver accurate, context-aware, and high-quality interactions within your RAG application.

Create a file called code.py. Here is the code of the same.

Code:

#from dotenv import load_dotenv
#load_dotenv('azure.env')

from promptflow.core import tool
from langchain_core.messages import AIMessage, HumanMessage
from promptflow.connections import CustomConnection
import os

@tool
def my_python_tool(input1: str, chat_history: list, my_conn: CustomConnection) -> str:
    connection_dict = dict(my_conn.secrets)
    for key, value in connection_dict.items():
        os.environ[key] = value
    print(connection_dict)
    from chain import rag_chain
    chat_history_revised = []
    for item in chat_history:
        chat_history_revised.append(HumanMessage(item['inputs']['question']))
        chat_history_revised.append(AIMessage(item['outputs']['answer']))
    return rag_chain.invoke({"input": input1, "chat_history": chat_history_revised})['answer']
 # type: ignore

Develop the application logic and user interface

First create a requirements.txt file. Here we can create a separate virtual environment and run the pip install -r requirements.txt.

Code:

langchain==0.2.6
langchain_openai
python-dotenv

Next step is creating a the Langchain LLM Chain using the new Langchain expression language. For this create a file called chain.py

Code:

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import AzureChatOpenAI
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_core.messages.human import HumanMessage
import os

embeddings = AzureOpenAIEmbeddings(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    openai_api_version="2024-03-01-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    azure_deployment="text-embedding-ada-002"
)

llm = AzureChatOpenAI(api_key = os.environ["AZURE_OPENAI_API_KEY"],  
                      api_version="2024-06-01",
                      azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
                      azure_deployment= "gpt-4o",
                      streaming=False)

index_name: str = "llm-powered-auto-agent"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.environ["ACS_ENDPOINT"],
    azure_search_key=os.environ["ACS_KEY"],
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

# Retrieve and generate using the relevant snippets of the blog.
retriever = vector_store.as_retriever()


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Test the flow: Once you are done with the above steps next step is to test the flow. You can do this using two way. One way is from VS Code PromptFlow extension. Here you first open the flow . As shown below. Click on test button.

Alternatively you can also use command line.

pf flow test --flow ..\rag-langchain-pf --interactive

Output:

Implementing Observability and Trackability

Observability and trackability are crucial for maintaining and improving GenAI applications:

Implement logging throughout your application, capturing:
- Inputs
- Outputs
- Intermediate steps

Azure Machine Learning provides the tracing capability for logging and managing your LLM applications tests and evaluations, while debugging and observing by drilling down the trace view.

The tracing any application feature today is implemented in the prompt flow open-source package, to enable user to trace LLM call or function, and LLM frameworks like LangChain and AutoGen, regardless of which framework you use, following OpenTelemetry specification. When you run the PromptFlow locally it automatically starts the pf service to trace under

http://127.0.0.1:23333/v1.0/ui/traces/?#collection=rag-langchain-pf

2. Use distributed tracing to track requests across different components of your system:

3. Set up metrics collection for key performance indicators (KPIs)

Deployment as Online Managed Endpoint:

A flow can be deployed to multiple platforms, such as a local development service, Docker container, Kubernetes cluster, etc.

Deploy into Azure App Service: If you want to deploy into Azure App Service. Here are the steps needs to be performed explained in the official blog.

If you want to deploy into Azure Machine Learning as Online Managed Endpoint, Here are the steps. You need to create below files.

First you want to register this a Model in Model Registry in AML. For that lets create a model.yaml.

Code:

$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: langchain-pf-model
path: .
description: register langchain pf folder as a custom model
properties:
  is-promptflow : true
  azureml.promptflow.mode : chat
  azureml.promptflow.chat_history : chat_history
  azureml.promptflow.chat_output : answer
  azureml.promptflow.chat_input : question
  azureml.promptflow.dag_file : flow.dag.yaml
  azureml.promptflow.source_flow_id : langchain-pf

Next we register this using the above file. Before that make sure you are already logged into Azure and Set the default workspace.

Code:

az account set --subscription <subscription ID>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>

Use az ml model create --file model.yaml to register the model to your workspace.

Next we create the endpoint with endpoint.yaml file. Use az ml online-endpoint create --file model.yaml

Code:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: langchain-pf-endpoint
description: basic chat endpoint deployed using CLI
auth_mode: key
properties:
# this property only works for system-assigned identity.
# if the deploy user has access to connection secrets, 
# the endpoint system-assigned identity will be auto-assigned connection secrets reader role as well
  enforce_access_to_default_secret_stores: enabled

Once our model is registered we can go ahead and create the online deployment. First lets create the deployment.yml file.

Code:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: langchain-pf-endpoint
model: azureml:langchain-pf-model:1
  # You can also specify model files path inline
  # path: examples/flows/chat/basic-chat
environment: 
  build:
    path: image_build_with_reqirements
    dockerfile_path: Dockerfile
  inference_config:
    liveness_route:
      path: /health
      port: 8080
    readiness_route:
      path: /health
      port: 8080
    scoring_route:
      path: /score
      port: 8080
instance_type: Standard_E16s_v3
instance_count: 1
request_settings:
  request_timeout_ms: 300000
environment_variables:
  PROMPTFLOW_CONNECTION_PROVIDER: azureml://subscriptions/<subscription_id>/resourceGroups/<resoure-name>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-name>
  APPLICATIONINSIGHTS_CONNECTION_STRING: <connection_string>

Use az ml online-deployment create --file blue-deployment.yml --all-traffic

Model Monitoring and Debugging Strategies

Effective monitoring and debugging are essential for maintaining the quality of your GenAI application:

Implement model performance monitoring to track:
- Accuracy
- Latency
- Other relevant metrics

You will be able to track the metrics with AML endpoint.

Set up alerts for anomalies or performance degradation

Use PromptFlow's built-in debugging tools to inspect and troubleshoot prompt executions. You can see individual prompts for check the quality and debug.

Implement A/B testing capabilities to compare:
- Different prompt strategies
- Model versions

you can run two different blue-green deployment and run the A/B testing with the same approach.

Ensuring Scalability in Enterprise Environments

To meet the demands of enterprise users, your GenAI application must be scalable:

Design your application with a microservices architecture for better scalability
Implement autoscaling using container orchestration platforms like Kubernetes
Optimize database and caching strategies for high-volume data processing
Consider using serverless technologies for cost-effective scaling of certain components

Conclusion

Building end-to-end enterprise GenAI applications with PromptFlow and LangChain offers a powerful approach to creating robust, observable, and scalable AI solutions. By focusing on observability, trackability, model monitoring, debugging, and autoscaling, you can create applications that meet the demanding requirements of enterprise environments.

As you embark on your GenAI development journey, remember that the field is rapidly evolving. Stay updated with the latest developments in PromptFlow, LangChain, and the broader AI landscape to ensure your applications remain at the cutting edge of technology.

References:

1. promptflow/examples at main · microsoft/promptflow

2. Deploy a flow in prompt flow to online endpoint for real-time inference with CLI - Azure Machine Learning

3. Deploy to Azure App Service — Prompt flow documentation

Continue reading...

GenAI Mastery: Crafting Robust Enterprise Solutions with PromptFlow and LangChain

mrajguru

Understanding Enterprise GenAI Applications​

PromptFlow and LangChain: A Powerful Combination​

PromptFlow​

LangChain​

Building the Application: A Step-by-Step Approach​

Implementing Observability and Trackability​

Deployment as Online Managed Endpoint:​

Model Monitoring and Debugging Strategies​

Ensuring Scalability in Enterprise Environments​

Conclusion​

Understanding Enterprise GenAI Applications

PromptFlow and LangChain: A Powerful Combination

PromptFlow

LangChain

Building the Application: A Step-by-Step Approach

Implementing Observability and Trackability

Deployment as Online Managed Endpoint:

Model Monitoring and Debugging Strategies

Ensuring Scalability in Enterprise Environments

Conclusion