Automate Azure Open AI custom model training data collection and deployment via Logic App

  • Thread starter Thread starter Drac_Zhang
  • Start date Start date
D

Drac_Zhang

Background


Recently, I'm planning to train a custom Open AI model which based Logic App cases in real-life. As per the situation, we have following challenges need to be resolved.

  1. We need huge amounts of records which cannot be provided from a single person. So we need to have a way to collect data from different people.
  2. Generate Jsonl files which requested by Open AI custom model.
  3. The training data is growing, so we need to automate the model training and deployment in schedule.
Mechanism


After some research and test, I found we can use Microsoft Form + Azure Storage Account + Logic App as resolution.



Microsoft Form: It is a very easy using services which can collect data from different teammates, here the sample form:

686x396?v=v2.png



Azure Storage Account: I'm using Storage Table and Blob container in this scenario, the Storage Table maintains the raw data which collected from Microsoft Form and blob stores the Jsonl files which generated by Logic App.



Logic App: provide main data collection and automate deployment flows.



Prerequisites

  1. Azure Open AI resource in North Central US or Sweden Central (As per the document Azure OpenAI Service models - Azure OpenAI | Microsoft Learn, recently only those 2 regions support fine-tuning models).
  2. Three Logic App Consumption with Managed Identity enabled which assigned "Storage Table Data Contributor", "Storage Blob Data Contributor" and "Cognitive Services OpenAI Contributor" role (Logic App Standard also can be a choice, then you need to have 3 workflows).
  3. Microsoft Form of your own template.



Detail Introduction of Logic App Implementation


Logic App (Consumption) sample code can be found in Drac-Zhang/Logic-App-Azure-Open-AI-custom-model-automation (github.com)

(PS: ARM template will be provided later)



Data Collector

This Logic App will be triggered once anyone submit a Microsoft Form response, it pick up the response and then ingest into Storage Table.

You need to change Form ID of Form trigger based on your Form ID.



Sample data in Storage Table:

871x345?v=v2.png





Training Dataset Generator

It will be triggered every day, retrieve all the data in Storage Table, create required Jsonl format training data and save in Blob container.

Every time it generates a new dataset, it will also update Storage Table with latest generated dataset file name.

Reference: Customize a model with Azure OpenAI Service - Azure OpenAI | Microsoft Learn

722x166?v=v2.png



Parameters need to be modified:

Action NameVariable NameComments
Initialize variable - Table Base UrlTableBaseUrl
Format:

https://[StorageName].table.core.windows.net/[TableName]?$select=Question,Answer





Custom Model Deploy

This Logic App implements the custom model deployment which is the most complex workflow in our resolution.

Backend logic is following:

  1. Get latest training dataset from Blob container.
  2. Upload dataset to Open AI "Data Files" via API (https://[OpenAIName].openai.azure.com/openai/files/import/?api-version=2023-09-15-preview) and waiting for file processed
  3. Create custom model via API (https://[OpenAIName].openai.azure.com/openai/fine_tuning/jobs?api-version=2023-09-15-preview) and waiting for model generated.
  4. Filter for deployments which have the same provided "Deployment Name", delete the existing deployments and re-deploy with new model via API ([ManagementUrl]/deployments/[DeploymentName]?api-version=2023-10-01-preview)



Parameters need to be modified:

Action NameVariable NameComments
Initialize variable - OpenAI nameOpenAINameYour Open AI resource name.
Initialize variable - Management UrlManagementUrl
Format:

https://management.azure.com/subscriptions/[Sub ID]/resourceGroups/[RG]/providers/Microsoft.CognitiveServices/accounts/[OpenAI Name]}
Initialize variable - Deployment NameDeployment NameYour deployment name





Additional Information

  • In Azure Storage Table connector, I don't find a place to fill in "Next page marker" for pagination in "Get Entities" action. So in "Training Data Generator" workflow, I have to use Http action to query Storage Table directly.
  • Logic App ARM template is not available yet, so you need to prepare API connection yourselves.
  • Based on the dataset size and load of backend, the custom model might need to take sometime to generate, default timeout for "Until" loop of waiting model creation is 12 hours, you might need to change to longer.
  • In my scenario, there's no request during weekend, so I can safely delete deprecated deployment. You may need to change this behavior as per your requirement.

Continue reading...
 
Back
Top