Guest Denise_Schlesinger Posted June 3 Posted June 3 [HEADING=1]Introduction[/HEADING] In this article we will demonstrate how we leverage GPT-4o capabilities, using images with function calling to unlock multimodal use cases. We will simulate a package routing service that routes packages based on the shipping label using OCR with GPT-4o. The model will identify the appropriate function to call based on the image analysis and the predefined actions for routing to the appropriate continent. [HEADING=1] [/HEADING] [HEADING=1]Background[/HEADING] The new GPT-4o (“o” for “omni”) can reason across audio, vision, and text in real time. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. GPT-4o now enables function calling. [HEADING=1]The application[/HEADING] We will run a Jupyter notebook that connects to GPT-4o to sort packages based on the printed labels with the shipping address. Here are some sample labels we will be using GPT-4o for OCR to get the country this is being shipped to and GPT-4o functions to route the packages. [HEADING=1]The environment[/HEADING] The code can be found here - Azure OpenAI code examples Make sure you create your python virtual environment and fill the environment variables as stated in the README.md file. [HEADING=1]The code[/HEADING] Connecting to Azure OpenAI GPT-4o deployment. from dotenv import load_dotenv from IPython.display import display, HTML, Image import os from openai import AzureOpenAI import json load_dotenv() GPT4o_API_KEY = os.getenv("GPT4o_API_KEY") GPT4o_DEPLOYMENT_ENDPOINT = os.getenv("GPT4o_DEPLOYMENT_ENDPOINT") GPT4o_DEPLOYMENT_NAME = os.getenv("GPT4o_DEPLOYMENT_NAME") client = AzureOpenAI( azure_endpoint = GPT4o_DEPLOYMENT_ENDPOINT, api_key=GPT4o_API_KEY, api_version="2024-02-01" ) Defining the functions to be called after GPT-4o answers. # Defining the functions - in this case a toy example of a shipping function def ship_to_Oceania(location): return f"Shipping to Oceania based on location {location}" def ship_to_Europe(location): return f"Shipping to Europe based on location {location}" def ship_to_US(location): return f"Shipping to Americas based on location {location}" Defining the available functions to be called to send to GPT-4o. It is very IMPORTANT to send the function's and parameters descriptions so GPT-4o will know which method to call. tools = [ { "type": "function", "function": { "name": "ship_to_Oceania", "description": "Shipping the parcel to any country in Oceania", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The country to ship the parcel to.", } }, "required": ["location"], }, }, }, { "type": "function", "function": { "name": "ship_to_Europe", "description": "Shipping the parcel to any country in Europe", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The country to ship the parcel to.", } }, "required": ["location"], }, }, }, { "type": "function", "function": { "name": "ship_to_US", "description": "Shipping the parcel to any country in the United States", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The country to ship the parcel to.", } }, "required": ["location"], }, }, }, ] available_functions = { "ship_to_Oceania": ship_to_Oceania, "ship_to_Europe": ship_to_Europe, "ship_to_US": ship_to_US, } Function to base64 encode our images, this is the format accepted by GPT-4o. # Encoding the images to send to GPT-4-O import base64 def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") The method to call GPT-4o. Notice below that we send the parameter "tools" with the JSON describing the functions to be called. def call_OpenAI(messages, tools, available_functions): # Step 1: send the prompt and available functions to GPT response = client.chat.completions.create( model=GPT4o_DEPLOYMENT_NAME, messages=messages, tools=tools, tool_choice="auto", ) response_message = response.choices[0].message # Step 2: check if GPT wanted to call a function if response_message.tool_calls: print("Recommended Function call:") print(response_message.tool_calls[0]) print() # Step 3: call the function # Note: the JSON response may not always be valid; be sure to handle errors function_name = response_message.tool_calls[0].function.name # verify function exists if function_name not in available_functions: return "Function " + function_name + " does not exist" function_to_call = available_functions[function_name] # verify function has correct number of arguments function_args = json.loads(response_message.tool_calls[0].function.arguments) if check_args(function_to_call, function_args) is False: return "Invalid number of arguments for function: " + function_name # call the function function_response = function_to_call(**function_args) print("Output of function call:") print(function_response) print() Please note that WE and not GPT-4o call the methods in our code based on the answer by GTP4-o. # call the function function_response = function_to_call(**function_args) Iterate through all the images in the folder. Notice the system prompt where we ask GPT-4o what we need it to do, sort labels for packages routing calling functions. # iterate through all the images in the data folder import os data_folder = "./data" for image in os.listdir(data_folder): if image.endswith(".png"): IMAGE_PATH = os.path.join(data_folder, image) base64_image = encode_image(IMAGE_PATH) display(Image(IMAGE_PATH)) messages = [ {"role": "system", "content": "You are a customer service assistant for a delivery service, equipped to analyze images of package labels. Based on the country to ship the package to, you must always ship to the corresponding continent. You must always use tools!"}, {"role": "user", "content": [ {"type": "image_url", "image_url": { "url": f"data:image/png;base64,{base64_image}"} } ]} ] call_OpenAI(messages, tools, available_functions) Let’s run our notebook!!! Running our code for the label above produces the following output: Recommended Function call: ChatCompletionMessageToolCall(id='call_lH2G1bh2j1IfBRzZcw84wg0x', function=Function(arguments='{"location":"United States"}', name='ship_to_US'), type='function') Output of function call: Shipping to Americas based on location United States That’s all folks! Thanks Denise Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.