Guest mrajguru Posted July 14 Posted July 14 In the rapidly evolving landscape of artificial intelligence, generative AI (GenAI) has emerged as a game-changer for enterprises. However, building end-to-end GenAI applications that are robust, observable, and scalable can be challenging. This blog post will guide you through the process of creating enterprise-grade GenAI solutions using PromptFlow and LangChain, with a focus on observability, trackability, model monitoring, debugging, and autoscaling. The purpose of this blog to give you an idea that even if you use LangChain or OpenAI SDK or Llama Index you can still use PromptFlow and AI Studio for enterprise grade GenAI applications. [HEADING=1]Understanding Enterprise GenAI Applications[/HEADING] Enterprise GenAI applications are AI-powered solutions that can generate human-like text, images, or other content based on input prompts. These applications need to be: Reliable Secure Scalable Key considerations include: Data privacy Performance at scale Integration with existing enterprise systems [HEADING=1]PromptFlow and LangChain: A Powerful Combination[/HEADING] [HEADING=2]PromptFlow[/HEADING] A toolkit for building AI applications with large language models (LLMs) Offers features like prompt management and flow orchestration [HEADING=2]LangChain[/HEADING] A framework for developing applications powered by language models Provides tools for prompt optimization and chaining multiple AI operations Together, these frameworks offer a robust foundation for enterprise GenAI applications: PromptFlow excels in managing complex prompt workflows LangChain provides powerful tools for interacting with LLMs and structuring applications [HEADING=1]Building the Application: A Step-by-Step Approach[/HEADING] Define your application requirements and use cases: Defining your application requirements and use cases is a pivotal step in developing a successful Retrieval-Augmented Generation (RAG) system for document processing. Begin by identifying the core objectives of your application, such as the types of documents it will handle, the specific data it needs to extract, and the desired output format. Clearly outline the use cases, such as automated report generation, data extraction for business intelligence, or enhancing customer support through better information retrieval. Detail the functional requirements, including the ability to parse various document formats, the accuracy and speed of the retrieval process, and the integration capabilities with existing systems. Additionally, consider non-functional requirements like scalability, security, and user accessibility. By thoroughly defining these aspects, you create a roadmap that guides the development process, ensuring the final application meets user expectations and delivers tangible value. Set up your development environment with PromptFlow and LangChain: Setting up your development environment with PromptFlow and LangChain is essential for building an efficient Retrieval-Augmented Generation (RAG) application. Start by ensuring you have a robust development setup, including a compatible operating system, necessary software dependencies, and a version control system like Git. Install PromptFlow, a powerful tool for designing, testing, and deploying prompt-based applications. This tool will streamline your workflow, allowing you to create, test, and optimize prompts with ease. Next, integrate LangChain, a versatile framework designed to facilitate the use of language models in your applications. LangChain provides modules for chaining together various components, such as prompts, retrieval mechanisms, and post-processing steps, enabling you to build complex RAG systems efficiently. Configure your environment to support these tools, ensuring you have the necessary libraries and frameworks installed, and set up a virtual environment to manage dependencies. By meticulously setting up your development environment with PromptFlow and LangChain, you lay a solid foundation for creating a robust, scalable, and efficient RAG application. Start with a Prompt Flow project. [iCODE]pf flow init --flow rag-langchain-pf --type chat[/iCODE] As soon as you run this you will able to see a folder with below files. Design your prompt flow using PromptFlow's visual interface: Designing your prompt flow using PromptFlow's visual interface is a crucial step in developing an intuitive and effective Retrieval-Augmented Generation (RAG) application. Begin by familiarizing yourself with PromptFlow's drag-and-drop interface, which allows you to visually map out the sequence of prompts and actions your application will execute. Start by defining the initial input prompts that will trigger the retrieval of relevant documents. Use the visual interface to connect these prompts to subsequent actions, such as querying your document database or calling external APIs for additional data. Next, incorporate conditional logic to handle various user inputs and scenarios, ensuring that your prompt flow can adapt dynamically to different contexts. Leverage PromptFlow's built-in modules to integrate language model responses, enabling seamless transitions between retrieving information and generating human-like text. As you design the flow, make use of visual debugging tools to test each step, ensuring that the prompts and actions work together harmoniously. This iterative process allows you to refine and optimize the prompt flow, making it more efficient and responsive to user needs. By taking advantage of PromptFlow's visual interface, you can create a clear, logical, and efficient prompt flow that enhances the overall performance and user experience of your RAG application. First install the visual studio extension for Prompt Flow. Once you installed the PromptFlow Extension, you will be able to see the Flow you just created using PF Init. If you open the Flow you will see below Next you create a custom connection which will store Azure OpenAI/ACS keys and endpoints. Create a file called langchain_pf_connection.yaml. Paste the below details there. $schema: https://azuremlschemas.azureedge.net/promptflow/latest/CustomConnection.schema.json name: langchain_pf_connection type: custom configs: test_key: test_value secrets: # required AZURE_OPENAI_ENDPOINT: https://XXXX.openai.azure.com/ AZURE_OPENAI_GPT_DEPLOYMENT: gpt-4o AZURE_OPENAI_API_KEY: XXXX ACS_ENDPOINT: https://search-XXXX.search.windows.net ACS_KEY: XXXX Now run the below command to create the custom connection in the terminal. [iCODE]pf connection create -f langchain_pf_connection.yaml[/iCODE] Once you create the connection next step is the create of the flow. Edit the flow.dag.yaml and paste the below code. id: template_chat_flow name: Template Chat Flow inputs: chat_history: type: list is_chat_input: false is_chat_history: true question: type: string is_chat_input: true outputs: answer: type: string reference: ${code.output} is_chat_output: true nodes: - name: code type: python source: type: code path: code.py inputs: chat_history: ${inputs.chat_history} input1: ${inputs.question} my_conn: langchain_pf_connection use_variants: false node_variants: {} environment: python_requirements_txt: requirements.txt Implement LangChain components for enhanced LLM interactions: Implementing LangChain components for enhanced LLM interactions is a key aspect of building a sophisticated Retrieval-Augmented Generation (RAG) application. LangChain offers a modular approach to integrating language models, enabling you to construct complex workflows that leverage the power of large language models (LLMs). Start by identifying the core components you need, such as input processing, retrieval mechanisms, and output generation. Begin with the input processing component to handle and preprocess user queries. This might involve tokenization, normalization, and contextual understanding to ensure the query is suitable for retrieval. Next, implement the retrieval component, which connects to your document database or API endpoints to fetch relevant information. LangChain provides tools to streamline this process, such as vector stores for efficient similarity searches and retrievers that can interface with various data sources. Once the relevant documents are retrieved, integrate the LLM component to generate responses. Use LangChain’s chaining capabilities to combine the retrieved information with prompts that guide the LLM in generating coherent and contextually appropriate outputs. You can also implement post-processing steps to refine the output, ensuring it meets the desired accuracy and relevance criteria. Additionally, consider incorporating LangChain’s memory components to maintain context across interactions, enhancing the continuity and relevance of the responses. By carefully implementing these components, you can create a robust system that leverages the strengths of LLMs to deliver accurate, context-aware, and high-quality interactions within your RAG application. Create a file called code.py. Here is the code of the same. #from dotenv import load_dotenv #load_dotenv('azure.env') from promptflow.core import tool from langchain_core.messages import AIMessage, HumanMessage from promptflow.connections import CustomConnection import os @tool def my_python_tool(input1: str, chat_history: list, my_conn: CustomConnection) -> str: connection_dict = dict(my_conn.secrets) for key, value in connection_dict.items(): os.environ[key] = value print(connection_dict) from chain import rag_chain chat_history_revised = [] for item in chat_history: chat_history_revised.append(HumanMessage(item['inputs']['question'])) chat_history_revised.append(AIMessage(item['outputs']['answer'])) return rag_chain.invoke({"input": input1, "chat_history": chat_history_revised})['answer'] # type: ignore Develop the application logic and user interface First create a requirements.txt file. Here we can create a separate virtual environment and run the pip install -r requirements.txt. langchain==0.2.6 langchain_openai python-dotenv Next step is creating a the Langchain LLM Chain using the new Langchain expression language. For this create a file called chain.py import bs4 from langchain_community.document_loaders import WebBaseLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import AzureChatOpenAI from langchain_community.vectorstores.azuresearch import AzureSearch from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI from langchain_core.messages.human import HumanMessage import os embeddings = AzureOpenAIEmbeddings( api_key=os.getenv("AZURE_OPENAI_API_KEY"), openai_api_version="2024-03-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), azure_deployment="text-embedding-ada-002" ) llm = AzureChatOpenAI(api_key = os.environ["AZURE_OPENAI_API_KEY"], api_version="2024-06-01", azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"], azure_deployment= "gpt-4o", streaming=False) index_name: str = "llm-powered-auto-agent" vector_store: AzureSearch = AzureSearch( azure_search_endpoint=os.environ["ACS_ENDPOINT"], azure_search_key=os.environ["ACS_KEY"], index_name=index_name, embedding_function=embeddings.embed_query, ) # Retrieve and generate using the relevant snippets of the blog. retriever = vector_store.as_retriever() def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) from langchain.chains import create_history_aware_retriever from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder contextualize_q_system_prompt = """Given a chat history and the latest user question \ which might reference context in the chat history, formulate a standalone question \ which can be understood without the chat history. Do NOT answer the question, \ just reformulate it if needed and otherwise return it as is.""" contextualize_q_prompt = ChatPromptTemplate.from_messages( [ ("system", contextualize_q_system_prompt), MessagesPlaceholder("chat_history"), ("human", "{input}"), ] ) history_aware_retriever = create_history_aware_retriever( llm, retriever, contextualize_q_prompt ) from langchain.chains import create_retrieval_chain from langchain.chains.combine_documents import create_stuff_documents_chain qa_system_prompt = """You are an assistant for question-answering tasks. \ Use the following pieces of retrieved context to answer the question. \ If you don't know the answer, just say that you don't know. \ Use three sentences maximum and keep the answer concise.\ {context}""" qa_prompt = ChatPromptTemplate.from_messages( [ ("system", qa_system_prompt), MessagesPlaceholder("chat_history"), ("human", "{input}"), ] ) question_answer_chain = create_stuff_documents_chain(llm, qa_prompt) rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain) Test the flow: Once you are done with the above steps next step is to test the flow. You can do this using two way. One way is from VS Code PromptFlow extension. Here you first open the flow . As shown below. Click on test button. Alternatively you can also use command line. [iCODE]pf flow test --flow ..\rag-langchain-pf --interactive[/iCODE] Output: [HEADING=1]Implementing Observability and Trackability[/HEADING] Observability and trackability are crucial for maintaining and improving GenAI applications: Implement logging throughout your application, capturing: Inputs Outputs Intermediate steps Azure Machine Learning provides the tracing capability for logging and managing your LLM applications tests and evaluations, while debugging and observing by drilling down the trace view. The tracing any application feature today is implemented in the prompt flow open-source package, to enable user to trace LLM call or function, and LLM frameworks like LangChain and AutoGen, regardless of which framework you use, following OpenTelemetry specification. When you run the PromptFlow locally it automatically starts the pf service to trace under http://127.0.0.1:23333/v1.0/ui/traces/?#collection=rag-langchain-pf 2. Use distributed tracing to track requests across different components of your system: 3. Set up metrics collection for key performance indicators (KPIs) [HEADING=1]Deployment as Online Managed Endpoint:[/HEADING] A flow can be deployed to multiple platforms, such as a local development service, Docker container, Kubernetes cluster, etc. Deploy into Azure App Service: If you want to deploy into Azure App Service. Here are the steps needs to be performed explained in the official blog. If you want to deploy into Azure Machine Learning as Online Managed Endpoint, Here are the steps. You need to create below files. First you want to register this a Model in Model Registry in AML. For that lets create a model.yaml. $schema: https://azuremlschemas.azureedge.net/latest/model.schema.json name: langchain-pf-model path: . description: register langchain pf folder as a custom model properties: is-promptflow : true azureml.promptflow.mode : chat azureml.promptflow.chat_history : chat_history azureml.promptflow.chat_output : answer azureml.promptflow.chat_input : question azureml.promptflow.dag_file : flow.dag.yaml azureml.promptflow.source_flow_id : langchain-pf Next we register this using the above file. Before that make sure you are already logged into Azure and Set the default workspace. az account set --subscription <subscription ID> az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group> Use [iCODE]az ml model create --file model.yaml[/iCODE] to register the model to your workspace. Next we create the endpoint with endpoint.yaml file. Use [iCODE]az ml online-endpoint create --file model.yaml[/iCODE] $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json name: langchain-pf-endpoint description: basic chat endpoint deployed using CLI auth_mode: key properties: # this property only works for system-assigned identity. # if the deploy user has access to connection secrets, # the endpoint system-assigned identity will be auto-assigned connection secrets reader role as well enforce_access_to_default_secret_stores: enabled Once our model is registered we can go ahead and create the online deployment. First lets create the deployment.yml file. $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json name: blue endpoint_name: langchain-pf-endpoint model: azureml:langchain-pf-model:1 # You can also specify model files path inline # path: examples/flows/chat/basic-chat environment: build: path: image_build_with_reqirements dockerfile_path: Dockerfile inference_config: liveness_route: path: /health port: 8080 readiness_route: path: /health port: 8080 scoring_route: path: /score port: 8080 instance_type: Standard_E16s_v3 instance_count: 1 request_settings: request_timeout_ms: 300000 environment_variables: PROMPTFLOW_CONNECTION_PROVIDER: azureml://subscriptions/<subscription_id>/resourceGroups/<resoure-name>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-name> APPLICATIONINSIGHTS_CONNECTION_STRING: <connection_string> Use [iCODE]az ml online-deployment create --file blue-deployment.yml --all-traffic[/iCODE] [HEADING=1]Model Monitoring and Debugging Strategies[/HEADING] Effective monitoring and debugging are essential for maintaining the quality of your GenAI application: Implement model performance monitoring to track: Accuracy Latency Other relevant metrics You will be able to track the metrics with AML endpoint. Set up alerts for anomalies or performance degradation Use PromptFlow's built-in debugging tools to inspect and troubleshoot prompt executions. You can see individual prompts for check the quality and debug. Implement A/B testing capabilities to compare: Different prompt strategies Model versions you can run two different blue-green deployment and run the A/B testing with the same approach. [HEADING=1]Ensuring Scalability in Enterprise Environments[/HEADING] To meet the demands of enterprise users, your GenAI application must be scalable: Design your application with a microservices architecture for better scalability Implement autoscaling using container orchestration platforms like Kubernetes Optimize database and caching strategies for high-volume data processing Consider using serverless technologies for cost-effective scaling of certain components [HEADING=1]Conclusion[/HEADING] Building end-to-end enterprise GenAI applications with PromptFlow and LangChain offers a powerful approach to creating robust, observable, and scalable AI solutions. By focusing on observability, trackability, model monitoring, debugging, and autoscaling, you can create applications that meet the demanding requirements of enterprise environments. As you embark on your GenAI development journey, remember that the field is rapidly evolving. Stay updated with the latest developments in PromptFlow, LangChain, and the broader AI landscape to ensure your applications remain at the cutting edge of technology. References: 1. promptflow/examples at main · microsoft/promptflow 2. Deploy a flow in prompt flow to online endpoint for real-time inference with CLI - Azure Machine Learning 3. Deploy to Azure App Service — Prompt flow documentation Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.