Posted November 27, 2024Nov 27 FPCH Staff TOC Introduction to OpenAI System Architecture Architecture Focus of This Tutorial Setup Azure Resources File and Directory Structure Bicep Template Running Locally Training Models and Training Data Predicting with the Model Publishing the Project to Azure Code Commit to Azure DevOps Publish to Azure Web App via Pipeline Running on Azure Web App Training the Model Using the Model for Prediction Troubleshooting docker.log freeze after deployment Others Conclusion References 1. Introduction to OpenAI TensorFlow is an open-source machine learning framework developed by Google. It provides tools for building and deploying machine learning models, with a focus on flexibility and scalability. TensorFlow supports deep learning, classical machine learning, and neural network models, enabling tasks like image recognition, natural language processing, and time series forecasting. At its core, TensorFlow uses computational graphs to model mathematical operations, allowing efficient computation on CPUs, GPUs, and TPUs. It features a high-level API, Keras, for easy model building, as well as lower-level APIs for advanced customization. TensorFlow also supports distributed training for large-scale datasets and diverse deployment options, including cloud services, mobile devices, and edge computing. TensorFlow’s ecosystem includes TensorFlow Hub (pre-trained models), TensorFlow Lite (for mobile and IoT), and TensorFlow.js (for JavaScript applications). Its integration with visualization tools like TensorBoard simplifies debugging and performance monitoring. TensorFlow excels in production environments, offering features like TensorFlow Extended (TFX) for end-to-end ML pipelines. With its versatile capabilities and large community, TensorFlow is widely used in industries like healthcare, finance, and technology, making it one of the most powerful tools for modern machine learning development. 2. System Architecture Architecture Development Environment OS: macOS Version: Sonoma 14.1.1 Python Version: 3.9.20 Azure Resources App Service Plan: SKU - Premium Plan 0 V3 App Service: Platform - Linux (Python 3.9, Version 3.9.19) Storage Account: SKU - General Purpose V2 File Share: No backup plan Focus of This Tutorial This tutorial walks you through the following stages: Setting up Azure resources Running the project locally Publishing the project to Azure Running the application on Azure Troubleshooting common issues Each of the mentioned aspects has numerous corresponding tools and solutions. The relevant information for this session is listed in the table below. Local OS Windows Linux Mac V How to setup Azure resources Portal (i.e., REST api) ARM Bicep Terraform V How to deploy project to Azure VSCode CLI Azure DevOps GitHub Action V 3. Setup Azure Resources File and Directory Structure Please open a terminal and enter the following commands: git clone https://github.com/theringe/azure-appservice-ai.git cd azure-appservice-ai bash ./tensorflow/tools/add-venv.sh If you are using a Windows platform, use the following alternative PowerShell commands instead: git clone https://github.com/theringe/azure-appservice-ai.git cd azure-appservice-ai .\tensorflow\tools\add-venv.cmd After completing the execution, you should see the following directory structure: File and Path Purpose tensorflow/tools/add-venv.* The script executed in the previous step (cmd for Windows, sh for Linux/Mac) to create all Python virtual environments required for this tutorial. .venv/tensorflow-webjob/ A virtual environment specifically used for training models (i.e., tokenize text, construct a neuron network for training). tensorflow/webjob/requirements.txt The list of packages (with exact versions) required for the tensorflow-webjob virtual environment. .venv/tensorflow/ A virtual environment specifically used for the Flask application, enabling API endpoint access for querying predictions (i.e., MBTI). tensorflow/requirements.txt The list of packages (with exact versions) required for the tensorflow virtual environment. tensorflow/ The main folder for this tutorial. tensorflow/tools/bicep-template.bicep The Bicep template to setup all the Azure resources related to this tutorial, including an App Service Plan, a Web App, and a Storage Account. tensorflow/tools/create-folder.* A script to create all directories required for this tutorial in the File Share, including train, model, and test. tensorflow/tools/download-sample-training-set.* A script to download a sample training set from MBTI text classifer trained on the Kaggle MBTI dataset, containing MBTI and post data from social media platforms, into the train directory of the File Share. tensorflow/webjob/train_mbti_model.py A script for tokenizing the posts from each record, training an LSTM-based model for MBTI classification, and saves the embedding vectors in the model directory of the File Share. tensorflow/App_Data/jobs/triggered/train-mbti-model/train_mbti_model.sh A shell script for Azure App Service web jobs. It activates the tensorflow-webjob virtual environment and starts the train_mbti_model.py script. tensorflow/api/app.py Code for the Flask application, including routes, port configuration, input parsing, vectors loading, predictions, and output generation. tensorflow/start.sh A script executed after deployment (as specified in the Bicep template startup command I will introduce it later). It sets up the virtual environment and starts the Flask application to handle web requests. tensorflow/pipeline.yml A process document for an Azure DevOps pipeline, detailing the steps to deploy code to an Azure Web App. Bicep Template We need to create the following resources or services: Manual Creation Required Resource/Service App Service Plan No Resource (plan) App Service Yes Resource (app) Storage Account Yes Resource (storageAccount) File Share Yes Service Let’s take a look at the tensorflow/tools/bicep-template.bicep file. Refer to the configuration section for all the resources. Since most of the configuration values don’t require changes, I’ve placed them in the variables section of the ARM template rather than the parameters section. This helps keep the configuration simpler. However, I’d still like to briefly explain some of the more critical settings. As you can see, I’ve adopted a camelCase naming convention, which combines the [Resource Type] with [Setting Name and Hierarchy]. This makes it easier to understand where each setting will be used. The configurations in the diagram are sorted by resource name, but the following list is categorized by functionality for better clarity. Configuration Name Value Purpose storageAccountFileShareName data-and-model [Purpose 1: Link File Share to Web App]Use this fixed name for File Share storageAccountFileShareShareQuota 5120 [Purpose 1: Link File Share to Web App]The value is in GB storageAccountFileShareEnabledProtocols SMB [Purpose 1: Link File Share to Web App] appSiteConfigAzureStorageAccountsType AzureFiles [Purpose 1: Link File Share to Web App] appSiteConfigAzureStorageAccountsProtocol Smb [Purpose 1: Link File Share to Web App] planKind linux [Purpose 2: Specify platform and stack runtime] Select Linux (default if Python stack is chosen) planSkuTier Premium0V3 [Purpose 2: Specify platform and stack runtime]Choose at least Premium Plan to ensure enough memory for your AI workloads planSkuName P0v3 [Purpose 2: Specify platform and stack runtime]Same as above appKind app,linux [Purpose 2: Specify platform and stack runtime] Same as above appSiteConfigLinuxFxVersion PYTHON|3.9 [Purpose 2: Specify platform and stack runtime]Select Python 3.9 to avoid dependency issues appSiteConfigAppSettingsWEBSITES_CONTAINER_START_TIME_LIMIT 1800 [Purpose 3: Deploying] The value is in seconds, ensuring the Startup Command can continue execution beyond the default timeout of 230 seconds. This tutorial’s Startup Command typically takes around 1200 seconds, so setting it to 1800 seconds (i.e., it is the max value) provides a safety margin and accommodates future project expansion (e.g., adding more packages) appSiteConfigAppCommandLine [ -f /home/site/wwwroot/start.sh ] && bash /home/site/wwwroot/start.sh || GUNICORN_CMD_ARGS=\"--timeout 600 --access-logfile '-' --error-logfile '-' -c /opt/startup/gunicorn.conf.py --chdir=/opt/defaultsite\" gunicorn application:app [Purpose 3: Deploying] This is the Startup Command, which can be break down into 3 parts: First (-f /home/site/wwwroot/start.sh): Checks whether start.sh exists. This is used to determine whether the app is in its initial state (just created) or has already been deployed. Second (bash /home/site/wwwroot/start.sh): If the file exists, it means the app has already been deployed. The start.sh script will be executed, which installs the necessary packages and starts the Flask application. Third (GUNICORN_CMD_ARGS=\"--timeout 600 --access-logfile '-' --error-logfile '-' -c /opt/startup/gunicorn.conf.py --chdir=/opt/defaultsite\" gunicorn application:app): If the file does not exist, the command falls back to the default HTTP server (gunicorn) to start the web app. Since the command is enclosed in double quotes within the ARM template, during actual execution, replace \" with " appSiteConfigAppSettingsSCM_DO_BUILD_DURING_DEPLOYMENT false [Purpose 3: Deploying] Since we have already defined the handling for different virtual environments in start.sh, we do not need to initiate the default build process of the Web App appSiteConfigAppSettingsWEBSITES_ENABLE_APP_SERVICE_STORAGE true [Purpose 4: Webjobs] This setting is required to enable the App Service storage feature, which is necessary for using web jobs (e.g., for model training) storageAccountPropertiesAllowSharedKeyAccess true [Purpose 5: Troubleshooting]This setting is enabled by default. The reason for highlighting it is that certain enterprise IT policies may enforce changes to this configuration after a period, potentially causing a series of issues. For more details, please refer to the Troubleshooting section below. Return to terminal and execute the following commands (their purpose has been described earlier). # Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai # Please change <RegionName> to your prefer region, for example: eastus2 # Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account) az group create --name <ResourceGroupName> --location <RegionName> az deployment group create --resource-group <ResourceGroupName> --template-file ./tensorflow/tools/bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName> If you are using a Windows platform, use the following alternative PowerShell commands instead: # Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai # Please change <RegionName> to your prefer region, for example: eastus2 # Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account) az group create --name <ResourceGroupName> --location <RegionName> az deployment group create --resource-group <ResourceGroupName> --template-file .\tensorflow\tools\bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName> After execution, please copy the output section containing 3 key-value pairs from the result like this. Return to terminal and execute the following commands: # Please setup 3 variables you've got from the previous step OUTPUT_STORAGE_NAME="<outputStorageName>" OUTPUT_STORAGE_KEY="<outputStorageKey>" OUTPUT_SHARE_NAME="<outputShareName>" # URL encode the storage key ENCODED_OUTPUT_STORAGE_KEY=$(python3 -c " import urllib.parse key = '''$OUTPUT_STORAGE_KEY''' encoded_key = urllib.parse.quote(key, safe='') # No safe characters, encode everything print(encoded_key) ") # Mount open smb://$OUTPUT_STORAGE_NAME:$ENCODED_OUTPUT_STORAGE_KEY@$OUTPUT_STORAGE_NAME.file.core.windows.net/$OUTPUT_SHARE_NAME Or you could simply go to Azure Portal, navigate to the File Share you just created, and refer to the diagram below to copy the required command. You can choose Linux or Windows if you are using such OS in your dev environment. After executing the command, the network drive will be successfully mounted. 4. Running Locally Training Models and Training Data Return to terminal and execute the following commands (their purpose has been described earlier). source .venv/tensorflow-webjob/bin/activate bash ./tensorflow/tools/create-folder.sh bash ./tensorflow/tools/download-sample-training-set.sh python ./tensorflow/webjob/train_mbti_model.py If you are using a Windows platform, use the following alternative PowerShell commands instead: .\.venv\tensorflow-webjob\Scripts\Activate.ps1 .\tensorflow\tools\create-folder.cmd .\tensorflow\tools\download-sample-training-set.cmd python .\tensorflow\webjob\train_mbti_model.py After execution, the File Share will now include the following directories and files. Let’s take a brief detour to examine the structure of the training data downloaded from the GitHub. The dataset used in this project focuses on MBTI (Myers-Briggs Type Indicator) personality types. Each record in the dataset contains a user’s MBTI type and a collection of their social media posts, separated by |||. This tutorial repurposes the dataset to classify personality types based on textual data. This image represents the raw data, where each line includes an MBTI type and its associated text. For training, the posts are tokenized and transformed into numerical sequences using TensorFlow's preprocessing tools. This step involves converting each word into a corresponding token based on a fixed vocabulary size. These sequences are then padded to a uniform length, ensuring consistency in the input data. During training, the performance is heavily influenced by factors like data balancing and hyperparameter tuning. The MBTI dataset is inherently imbalanced, with certain personality types appearing far more frequently than others. To address this, only 30 samples per type are used in training to ensure balance. However, this approach simplifies the task and may lead to suboptimal results. The inference step involves tokenizing a new input post and passing it through the trained model to predict the MBTI type. It is important to note that with the current setup, the inference results may often return the same prediction. This is due to the limited dataset size, imbalanced data handling, and the need for further tuning of training parameters such as the number of epochs, batch size, and learning rate. This tutorial introduces an approach to train and infer MBTI personality types using TensorFlow. While the process highlights key steps like data preprocessing, model training, and inference, it does not delve deeply into AI-specific topics like advanced model optimization or deployment. To achieve better results, we could setup the dataset can be expanded to include more samples per personality type. Or setup hyperparameters like the learning rate, number of epochs, and embedding dimensions should be fine-tuned. Predicting with the Model Return to terminal and execute the following commands. First, deactivate the virtual environment, then activate the virtual environment for the Flask application, and finally, start the Flask app. Commands for Linux or Mac: deactivate source .venv/tensorflow/bin/activate python ./tensorflow/api/app.py Commands for Windows: deactivate .\.venv\tensorflow\Scripts\Activate.ps1 python .\tensorflow\api\app.py When you see a screen similar to the following, it means the server has started successfully. Press Ctrl+C to stop the server if needed. Before conducting the actual test, let’s construct some sample query data: I am happy Next, open a terminal and use the following curl commands to send requests to the app: curl -X GET "http://0.0.0.0:8000/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}' You should see the prediction results. PS: Your results may differ from mine due to variations in the sampling of your training dataset compared to mine. 5. Publishing the Project to Azure Code Commit to Azure DevOps First, create a new and empty repository (referred to as repo) under your Azure DevOps project and get its URL. Open a terminal in the cloned azure-appservice-ai project directory and run the following commands to add the new repo as a push/pull target. Then, verify the associated git repos for the directory. git remote add azure https://<organization>@dev.azure.com/<organization>/<project>/_git/azure-appservice-ai git remote -v Next, run the following commands in the terminal to push the entire project to the Azure DevOps repo. git push azure --all The following steps need to be performed only once.These configurations ensure that the pipeline can automatically deploy the tensorflow portion of the azure-appservice-ai project to the newly created Azure Web App. Setup Service Connection: Go to Project Settings in Azure DevOps and perform the necessary operations. Specify the Service connection name as "azure-appservice-ai-tensorflow" (you can use any name for easy identification). Create Pipeline YAML File: Navigate to the tensorflow subdirectory and create a new file named azure-pipeline.yml Copy the contents of another file named pipeline.yml (in the same directory) into azure-pipeline.yml Modify the variables section as indicated by the comments, then save and commit the changes. Setup the Pipeline: Navigate to the Pipelines section and create a new pipeline. Follow the prompts to select the newly created azure-pipeline.yml as the pipeline script file. Save the configuration (do not run it yet). The above setup steps only need to be done once. Next, you can deploy the project to the Azure Web App using the pipeline in different ways. Publish to Azure Web App via Pipeline Manual Trigger: Navigate to the newly created pipeline. Click Run Pipeline to start the deployment process. Click on the deployment to monitor its progress. Below is an example of a successful deployment screen. Trigger on Push: Alternatively, you can configure the pipeline to run automatically whenever new code is pushed to the Azure DevOps repo: Open a terminal and run the following commands (after code updates):git push azure --all This will trigger a new pipeline deployment process. 6. Running on Azure Web App Training the Model Return to terminal and execute the following commands to invoke the WebJobs. Commands for Linux or Mac: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; curl -X POST -H "Authorization: Bearer $token" -H "Content-Type: application/json" -d '{}' "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01" Commands for Windows: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own $token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"; "Content-type" = "application/json"} -Method POST -Body '{}' You could see the training status by execute the following commands. Commands for Linux or Mac: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; response=$(curl -s -H "Authorization: Bearer $token" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01") ; echo "$response" | jq Commands for Windows: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own $token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv); $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"} -Method GET ; $response | ConvertTo-Json -Depth 10 Processing Complete And you can get the latest detail log by execute the following commands. Commands for Linux or Mac: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; history_id=$(az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | sed 's|.*/history/||') ; response=$(curl -X GET -H "Authorization: Bearer $token" -H "Content-Type: application/json" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01") ; log_url=$(echo "$response" | jq -r '.properties.output_url') ; curl -X GET -H "Authorization: Bearer $token" "$log_url" Commands for Windows: # Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own $token = az account get-access-token --resource https://management.azure.com --query accessToken -o tsv ; $history_id = az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | ForEach-Object { ($_ -split "/history/")[-1] } ; $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01" -Headers @{ Authorization = "Bearer $token" } -Method GET ; $log_url = $response.properties.output_url ; Invoke-RestMethod -Uri $log_url -Headers @{ Authorization = "Bearer $token" } -Method GET Once you see the report in the Logs, it indicates that the training is complete, and the Flask app is ready for predictions. You can also find the newly trained models in the File Share mounted in your local environment. Using the Model for Prediction Just like in local testing, open a terminal and use the following curl commands to send requests to the app: # Note: Replace the instance of tensorflow-bicep-app with the name of your web app. curl -X GET "https://tensorflow-bicep-app.azurewebsites.net/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}' As with the local environment, you should see the expected results. 7. Troubleshooting docker.log freeze after deployment Symptom: I cannot get the latest deployment status after Azure DevOps publish the code to Web App via kudu site and frontpage getting error 504 Cause: This project includes two virtual environments, each containing a TensorFlow package. During the start.sh process of creating these environments, each environment takes approximately 10 minutes to set up. As a result, the docker.log or Log Stream might temporarily stall for about 20 minutes at a certain stage. Resolution: After roughly 20 minutes, once all the packages are downloaded, the logs will resume recording. Others Using Scikit-learn on Azure Web App Using OpenAI on Azure Web App 8. Conclusion TensorFlow, much like a Swiss Army knife, encompasses a wide range of training algorithms. While Azure Web App is not typically used for training models, it can still serve as a platform for inference. In the future, I plan to introduce how pre-trained models can be directly loaded using JavaScript. This approach allows the inference workload for non-sensitive models to be offloaded to the client side. 9. References TensorFlow MBTI text classifer trained on the Kaggle MBTI dataset (MBTI) Myers-Briggs Personality Type Dataset
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.