Guest Abdulhamid_Onawole Posted June 13 Posted June 13 Hello! Abdulhamid here! I am a Microsoft student ambassador writing from Aston University, where I am studying Applied Artificial Intelligence. I'm excited to share insights on leveraging Azure AI and OpenAI to build cutting-edge applications. Whether you're a fellow student, a developer, or an AI enthusiast, this step-by-step guide will help you harness the power of OpenAI’s models to enhance existing apps or create new innovative ones. With Azure AI Studio offering a wide array of AI services that can be easily deployed, and the new Azure OpenAI services granting access to advanced generative models, integrating these tools has become more straightforward than ever. In this article, I will guide you through the process of building a smart nutritionist app by leveraging Azure Document Intelligence service for text extraction and Azure OpenAI’s GPT-4o for human-like, accurate responses. Prerequisites: An active Azure subscription Registered access to Azure OpenAI service Basic knowledge of Python programming language [HEADING=1] Text Extraction with Azure’s pre-built model [/HEADING] For our application, we want to extract the ingredients and nutrition information from the food products we consume. Document Intelligence simplifies this process by accurately extracting text, tables, key-value pairs and other structures from images and documents. While you can train and build custom models with this service, Azure accelerates your app development process by provisioning highly accurate pre-built models. One such model we will be leveraging is the prebuilt-layout that easily extracts different structures which will be useful for retrieving nutritional information that is mostly printed in tables. To deploy a Document Intelligence resource: Sign in to Microsoft Azure and search Document Intelligence Click on the search result and create on the top pane Fill in the following to create and deploy the service Subscription: select your active subscription Resource group: Select an existing resource group or create a new one Name: name your resource. Region: select any region you wish, with east US being the default Pricing: Select free tier (F0) Click on review + create and then create to deploy the resource Once a resource has been deployed, click on "Go to resource". Scroll to the bottom of the page and copy one of the access keys and endpoint–you need this to connect your app to your deployed service. [HEADING=2]Setting up your development environment [/HEADING] Next, set up the environment for your app development: Create a .env file to hold your credentials Open your notepad and paste the following: AZURE_FORM_RECOGNIZER_ENDPOINT="YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT" AZURE_FORM_RECOGNIZER_KEY="YOUR_AZURE_FORM_RECOGNIZER_KEY" b. Copy your endpoint and anyone of the keys and paste in the placeholders c. Save the file into a new folder “nutrition_app” with the name .env and save as all files Open VS code and open your newly created folder Press Ctrl + Shift + P to open your terminal and run the following commands: [iCODE]pip install azure-ai-formrecognizer==3.3.0[/iCODE] Press Ctrl + Alt + Windows + N, select python file to create a new python file We need to import the necessary libraries and set up the configurations required for us to connect to the pre-built model. Copy and paste the code below to do that. I have also loaded sample images of a nutrition label and ingredients. # import modules import os from dotenv import load_dotenv from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient load_dotenv() # Azure Form Recognizer Configuration azure_form_recognizer_endpoint = os.getenv("AZURE_FORM_RECOGNIZER_ENDPOINT") azure_form_recognizer_key = os.getenv("AZURE_FORM_RECOGNIZER_KEY") ingredients_file_uri = "https://github.com/HamidOna/azurelearn/blob/main/20240609_002105.jpg?raw=true" nutrition_table_file_uri = "https://github.com/HamidOna/azurelearn/blob/main/20240608_192914.jpg?raw=true" fileModelId = "prebuilt-layout" This pre-built model can recognize tables, lines, and words. We would extract the ingredients as lines and the nutrition facts as a table to properly parse the structure without getting any of the content jumbled up. We would also add a few more lines of code to achieve this before we finally print the extracted text. Copy the code below to the existing script. document_analysis_client = DocumentAnalysisClient( endpoint=azure_form_recognizer_endpoint, credential=AzureKeyCredential(azure_form_recognizer_key) ) poller_ingredients = document_analysis_client.begin_analyze_document_from_url( model_id=fileModelId, document_url=ingredients_file_uri ) result_ingredients = poller_ingredients.result() # Extract text labels from the ingredients image ingredients_content = "" if result_ingredients.pages: for idx, page in enumerate(result_ingredients.pages): for line in page.lines: ingredients_content += f"{line.content}\n" # Connect to Azure Form Recognizer for nutrition table image print(f"\nConnecting to Forms Recognizer at: {azure_form_recognizer_endpoint}") print(f"Analyzing nutrition table at: {nutrition_table_file_uri}") poller_nutrition_table = document_analysis_client.begin_analyze_document_from_url( model_id=fileModelId, document_url=nutrition_table_file_uri ) result_nutrition_table = poller_nutrition_table.result() # Extract table content from the nutrition table image nutrition_table_content = "" if result_nutrition_table.tables: for table_idx, table in enumerate(result_nutrition_table.tables): table_content = [] for row_idx in range(table.row_count): row_content = [""] * table.column_count table_content.append(row_content) for cell in table.cells: table_content[cell.row_index][cell.column_index] = cell.content nutrition_table_content += f"\nTable #{table_idx + 1}:\n" for row in table_content: nutrition_table_content += "\t".join(row) + "\n" combined_content = f"Ingredients:\n{ingredients_content}\nNutrition Table:\n{nutrition_table_content}" print(combined_content) Now save the file as app.py and proceed to run it in your terminal [iCODE]python app.py [/iCODE] You should get an output similar to this: [HEADING=1] [/HEADING] [HEADING=1]Connecting to Openai GPT4o and parsing the extracted text [/HEADING] We have completed the first half of this project. Next, we set up the GPT4o model and then parse our data to generate results. First create an Openai resource (fill out this registration form if you don't already have access): Subscription: select your active subscription Resource group: Select an existing resource group or create a new one Name: name your resource. Region: Select any region from the available lists of regions. Pricing: Select Standard S0 After deploying the resource, click on “Go to Azure Openai studio” on the top pane Scroll down on the left pane and click on the “Deployments” page Click on “create new deployment” next Select GPT4o in the list of models Assign a name to the deployment (note down the name so you can connect to it) Reduce the token rate limit to 7K and then “Create” Once that has successfully deployed, return to the Openai resource in portal.azure.com and copy the endpoint and access key Next, go to the app.py script. Open the terminal and run: [iCODE]pip install openai [/iCODE] Navigate to the .env file and paste the following into the file after replacing with your endpoint and access key from the Openai resource AZURE_OAI_ENDPOINT="YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT" AZURE_OAI_KEY="YPUR_AZURE_FORM_RECOGNIZER_KEY" AZURE_OAI_DEPLOYMENT="YOUR_DEPLOYED_MODEL_NAME" Copy the imports below and replace in the current script # import modules import os from dotenv import load_dotenv from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient from openai import AzureOpenAI Set up the configurations right after the print statement for the extracted text: client = AzureOpenAI( azure_endpoint=azure_oai_endpoint, api_key=azure_oai_key, api_version="2024-05-13" ) [HEADING=2]Prompt Engineering [/HEADING] An important part of using LLM models and getting accurate results is prompting. Prompt engineering allows you to accurately pass instructions to the model on how to behave which is essential to how perfectly it executes its tasks. It's good practice to spend a few minutes crafting an excellent prompt tailored to your use case. For this project, we want our model to be able to tell us about the ingredients and provide helpful advice about them. We also want it to print a summary of its report before extensive detail about the ingredients. Another useful tip is to pass an example of a query and an example output from the model. See below the implementation: # Create a system message system_message = """ You are a smug, funny nutritionist who provides health advice based on ingredients and nutrition tables. Provide advice on what is safe to consume based on the ingredients and nutrition table. Discuss the ingredients as a whole but single out scientifically named ingredients so the user can understand them better. Mention the adequate consumption or potential harm based on excessive amounts of substances. Identify any potential allergies. Output a general summary first before giving further details. Here are a few examples: - "Example: {User query}: Please analyze the following ingredients and nutrition label content: Ingredients: Potatoes, Vegetable Oils, Salt, Potassium phosphates Nutrition Table: - Energy: 532 kcal per 100g - Fat: 31.5g per 100g - Sodium: 1.28g per 100g {System}: Summary: The ingredients are pretty standard for potato crisps. Potatoes and vegetable oils provide the base, while salt adds flavor. Watch out for the high fat and sodium content if you're trying to watch your heart health or blood pressure. As for allergies, you're mostly safe unless you're allergic to potatoes or sunflower/rapeseed oil. Potassium phosphates? Just some friendly muscle helpers, but keep it moderate! Potassium phosphates: Ah, the magical salts that help keep your muscles happy. Just don't overdo it!" """ Copy and paste the message above into the existing python script. Next, we parse the extracted text to the model and make a request. messages_array = [{"role": "system", "content": system_message}] # Add the extracted nutrition label content to the user messages messages_array.append({"role": "user", "content": f"Please analyze the following nutrition label content:\n{combined_content}"}) # Send request to Azure OpenAI model response = client.chat.completions.create( model=azure_oai_deployment, temperature=0.6, max_tokens=1200, messages=messages_array ) generated_text = response.choices[0].message.content # Print the summary generated by OpenAI print("Summary: " + generated_text + "\n") Copy and paste the code above. Save your changes and run in your terminal. [iCODE]python nutrition_app.py[/iCODE] Voila! You have your own personal food nutritionist. See sample result below: You can further refine the system message to fit your diet such as watching out for food with high sugar content, specifying allergies, helping you find halal food and so on. Check out the following resources to improve the app and build your own specific use cases: Document Intelligence pre-built models Prompt engineering Project on Github with streamlit interface Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.