A
Andreas_Helland
Getting started with Azure OpenAI is easy enough - here's your deployment, here's an API key, go! Which from a marketing perspective is great. Less so if you want to be frugal with your tokens
Cost is only one parameter when developing AI solutions though. You need testability, you need mock data, redundancy, etc. In other words, you need all the things you need even if there isn't any AI near your app at all. While not front and center on the Azure landing page, Microsoft does have tools and features that can assist you with this and that is what we'll look at today.
The AI Simulator
Note: the focus here is not on the AI bits themselves so that part is not fancy at all. The focus is on the supporting pieces.
I am fully aware that you can run models locally on your machine and expose an API accordingly if you don't want to hit the cloud during dev, but that's not the use case we're solving here. I want something that is tied to Azure while allowing a level of flexibility of cloud vs local. (But you can spin up something in LM Studio and tweak this code to work with a local model if that's what you are looking for.)
We will use a simple web app and the Azure OpenAI SDK for .NET to demo a few different angles of the developer experience.
Microsoft provides an Azure OpenAI Simulator which from an API/SDK perspective behaves like a real instance:
GitHub - microsoft/aoai-api-simulator: A sample showing how to create a simulated API implementation for Azure OpenAI (AOAI)
This is the basis for what we will be working with in our sample. For our purposes it's easiest to deploy it as a Docker image locally (Dockerfile provided by MS). Since I've been on a roll lately with .NET Aspire I chose to go with that as the central orchestration piece.
The structure of the views (we will cover what they do along the way) is more or less the same across different use cases:
All the code in this post can be found here:
GitHub - ahelland/ai-dev-zone: Samples for using Azure OpenAI and an OpenAI simulator with .NET Aspire
Hardcoding localhost seems like a bad idea and sort of non-deployable. I know and it's on purpose here. I tested pushing the simulator to an Azure Container Registry and pulling it into an Azure Container App. That's not a problem at all, (localhost wouldn't be valid there either) but it would require some extra logic and generation of deployment scripts so I opted for making things simpler instead.
The following code adds a simulator instance to our web app (in AppHost Program.cs
The result is nonsensical, but for exercising the SDK, doing load testing and experimenting with the UI it works. (As with an actual AI deployment there will be a delay for the generation.)
You will most likely want to move past the fluff and get real responses - the simulator has a mode for that too. You can inject the values for an Azure OpenAI deployment and put it in "recording" mode. Your question will then be proxied through and the response will be stored on disk in a JSON file.
A few more lines of code will let Aspire handle this piece as well - both creating the Azure OpenAI resource and another instance of the simulator:
If you ask the same question you will hopefully get a more meaningful answer.
This also works as a cache so if you repeat the question it should be read from the file instead of hitting the endpoint in Azure.
We're not recording just because we can - we record because that provides us with a replay mode as well. So, we create yet another instance of the simulator that we point to the same directory as the recorder:
This doesn't forward requests to the cloud, but will just check the files locally. This means that the same question asked repeatedly will give the exact same answer which is different behavior than asking an AI assistant but has the benefit of being more predictable for dev and test. For that matter you can edit the JSON as well to change the answers, add more questions, etc. If you ask a new question (with no recorded answer) it will not be able to return anything meaningful so as a shortcut I catch exceptions to insert a placeholder text.
I went with the defaults for the simulator but you can also change the latency and token limits as you might not want to allow unlimited usage.
And as expected, a minor tweak regarding the location I query means that the bot knows nothing:
Looking good, but there's one more trick we can pull here to blur the lines between dev and prod. Many companies do not want API calls to go straight to the resource exposing the data. Instead, one puts Azure API Management (APIM) in front to control the inbound traffic and add features like load balancing, throttling and more. APIM has OpenAI awareness so you can easily add the AI deployment we created as a backend to a separate API. More details on that can be found here:
GitHub - Azure-Samples/AI-Gateway: APIM OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services
I went with the policy for the "Token Rate Limiting lab", but feel free to experiment with the more complicated policies.
You probably will not be surprised by now that this is something I'm also adding through Aspire:
The AI simulator instancing is identical to the previous one - I just use the url and key to APIM instead of the Azure OpenAI credentials. The APIM provisioning takes in the corresponding values in a secure manner. Once again to reduce complexity, we're using keys here instead of managed identities which I recommend using as much as possible when the code runs in Azure.
Arguably it creates a somewhat prolonged request pipeline, and it gets complicated if you apply different restrictions at each stage.
You may not want to use all of these components, but you can mix and match in the sense that the code doesn't need to have an opinion on what lies behind the endpoint.
While we're not deploying the app as such to Azure in any way clearly both Azure OpenAI and Azure APIM are things that you need to provision into a subscription / resource group. I think this creates an interesting paradigm. You have always had the ability to do things like creating a SQL database in Azure and connecting from a debugging session in Visual Studio, but it has been more common that you use LocalDB locally and swap out the connection string during deployment. Here we are able to create a configuration that uses Azure with seamless integration into the developer inner loop. The resources are intended for single dev use while at the same time being easy to share with a team if necessary. (Key Vault is almost free & Azure OpenAI doesn't have a cost unless you actively send tokens back and forth. APIM does have a cost though since policies cannot be used with the consumption SKU.)
The code so far hasn't gone into how these resources are created. The builder.AddBicepTemplate points to Bicep files that cares of this. We're not employing the Azure Developer CLI since we don't deploy the actual web app - Aspire can handle the rest on it's own. You will notice that there are values that need to be added to appsettings.json in a predefined pattern:
The "parameters" section is for logging into the app and requires a registration in Entra ID to work. The "Azure" section is what Aspire uses for the Bicep code you supply.
Code from me to you can be found here:
GitHub - ahelland/ai-dev-zone: Samples for using Azure OpenAI and an OpenAI simulator with .NET Aspire
You can step into the Azure Portal afterwars and play around if you like with the resources Aspire made for you, but your code is totally local (until you check-in to your repo and follow that flow). When done you can delete the resources from the Portal. Do note that the AI deployment must be deleted before the AI account. Key Vaults and APIM needs to be purged after deletion or a redeployment will fail.
At the end of the day - AI development is just "development"
Continue reading...
Cost is only one parameter when developing AI solutions though. You need testability, you need mock data, redundancy, etc. In other words, you need all the things you need even if there isn't any AI near your app at all. While not front and center on the Azure landing page, Microsoft does have tools and features that can assist you with this and that is what we'll look at today.
The AI Simulator
Note: the focus here is not on the AI bits themselves so that part is not fancy at all. The focus is on the supporting pieces.
I am fully aware that you can run models locally on your machine and expose an API accordingly if you don't want to hit the cloud during dev, but that's not the use case we're solving here. I want something that is tied to Azure while allowing a level of flexibility of cloud vs local. (But you can spin up something in LM Studio and tweak this code to work with a local model if that's what you are looking for.)
We will use a simple web app and the Azure OpenAI SDK for .NET to demo a few different angles of the developer experience.
Microsoft provides an Azure OpenAI Simulator which from an API/SDK perspective behaves like a real instance:
GitHub - microsoft/aoai-api-simulator: A sample showing how to create a simulated API implementation for Azure OpenAI (AOAI)
This is the basis for what we will be working with in our sample. For our purposes it's easiest to deploy it as a Docker image locally (Dockerfile provided by MS). Since I've been on a roll lately with .NET Aspire I chose to go with that as the central orchestration piece.
The structure of the views (we will cover what they do along the way) is more or less the same across different use cases:
Code:
@code {
public class chatDialog
{
public string? systemMessage;
public string? inputText;
public string? outputText;
public int maxTokens = 400;
public float temperature = 0.7f;
}
//This view is hardwired to use the simulator so we can adjust accordingly
private string oaiEndpoint = string.Empty;
private string oaiDeploymentName = string.Empty;
private string oaiKey = string.Empty;
public static chatDialog dialog = new();
protected override void OnInitialized()
{
oaiEndpoint = "http://localhost:8000";
oaiDeploymentName = Configuration["oaiDeploymentName"] ?? "gpt-4o";
oaiKey = Configuration["oaiKey"] ?? string.Empty;
dialog = new()
{
systemMessage = "I am a hiking enthusiast named Forest who helps people discover hikes in their area. If no area is specified, I will default to near Rainier National Park. I will then provide three suggestions for nearby hikes that vary in length. I will also share an interesting fact about the local nature on the hikes when making a recommendation.",
inputText = "Can you recommend some good hikes in the Redmond area?",
outputText = string.Empty,
temperature = 0.7f,
maxTokens = 400,
};
}
protected async Task chat()
{
AzureOpenAIClient client = new AzureOpenAIClient(new Uri(oaiEndpoint), new System.ClientModel.ApiKeyCredential(oaiKey));
OpenAI.Chat.ChatClient chatClient = client.GetChatClient(oaiDeploymentName);
OpenAI.Chat.ChatCompletionOptions chatCompletionOptions = new()
{
MaxOutputTokenCount = dialog.maxTokens,
Temperature = dialog.temperature,
};
OpenAI.Chat.ChatCompletion completion = await chatClient.CompleteChatAsync(
[
new OpenAI.Chat.SystemChatMessage(dialog.systemMessage),
new OpenAI.Chat.UserChatMessage(dialog.inputText),
],chatCompletionOptions);
var response = $"Response:\r\n{completion.Content[0].Text} \r\nOutput tokens: {completion.Usage.OutputTokenCount}\r\nTotal tokens: {completion.Usage.TotalTokenCount}";
dialog.outputText = response;
}
}
GitHub - ahelland/ai-dev-zone: Samples for using Azure OpenAI and an OpenAI simulator with .NET Aspire
Hardcoding localhost seems like a bad idea and sort of non-deployable. I know and it's on purpose here. I tested pushing the simulator to an Azure Container Registry and pulling it into an Azure Container App. That's not a problem at all, (localhost wouldn't be valid there either) but it would require some extra logic and generation of deployment scripts so I opted for making things simpler instead.
The following code adds a simulator instance to our web app (in AppHost Program.cs
Code:
builder.AddDockerfile("aoai-simulator-generate", "../AOAI_API_Simulator")
.WithHttpEndpoint(port: 8000, targetPort:oaiSimulatorPort)
.WithEnvironment("SIMULATOR_MODE", "generate")
.WithEnvironment("SIMULATOR_API_KEY", localOaiKey)
.ExcludeFromManifest();
You will most likely want to move past the fluff and get real responses - the simulator has a mode for that too. You can inject the values for an Azure OpenAI deployment and put it in "recording" mode. Your question will then be proxied through and the response will be stored on disk in a JSON file.
A few more lines of code will let Aspire handle this piece as well - both creating the Azure OpenAI resource and another instance of the simulator:
Code:
var azaoai = builder.AddBicepTemplate(
name: "AI",
bicepFile: "../infra/ai.bicep")
.WithParameter(AzureBicepResource.KnownParameters.KeyVaultName);
var cloudEndpoint = azaoai.GetOutput("endpoint");
var accountName = azaoai.GetOutput("accountName");
var cloudKey = azaoai.GetSecretOutput("accountKey");
var cloudDeployment = "gpt-4o";
builder.AddDockerfile("aoai-simulator-record", "../AOAI_API_Simulator")
.WithBindMount("recordings", "/app/.recording")
.WithHttpEndpoint(port: 8001, targetPort: oaiSimulatorPort)
.WithEnvironment("SIMULATOR_API_KEY", localOaiKey)
.WithEnvironment("SIMULATOR_MODE", "record")
.WithEnvironment("AZURE_OPENAI_ENDPOINT", cloudEndpoint)
.WithEnvironment("AZURE_OPENAI_KEY", cloudKey)
.WithEnvironment("AZURE_OPENAI_DEPLOYMENT", cloudDeployment)
.WithEnvironment("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", cloudDeployment)
.ExcludeFromManifest();
This also works as a cache so if you repeat the question it should be read from the file instead of hitting the endpoint in Azure.
We're not recording just because we can - we record because that provides us with a replay mode as well. So, we create yet another instance of the simulator that we point to the same directory as the recorder:
Code:
builder.AddDockerfile("aoai-simulator-replay", "../AOAI_API_Simulator")
.WithBindMount("recordings", "/app/.recording")
.WithHttpEndpoint(port: 8002, targetPort: oaiSimulatorPort)
.WithEnvironment("SIMULATOR_API_KEY", localOaiKey)
.WithEnvironment("SIMULATOR_MODE", "replay")
I went with the defaults for the simulator but you can also change the latency and token limits as you might not want to allow unlimited usage.
Code:
try
{
OpenAI.Chat.ChatCompletion completion = await chatClient.CompleteChatAsync(
[
new OpenAI.Chat.SystemChatMessage(dialog.systemMessage),
new OpenAI.Chat.UserChatMessage(dialog.inputText),
], chatCompletionOptions);
var response = $"Response:\r\n{completion.Content[0].Text} \r\nOutput tokens: {completion.Usage.OutputTokenCount}\r\nTotal tokens: {completion.Usage.TotalTokenCount}";
dialog.outputText = response;
}
catch (Exception)
{
dialog.outputText = "I don't know what you are talking about.";
}
Looking good, but there's one more trick we can pull here to blur the lines between dev and prod. Many companies do not want API calls to go straight to the resource exposing the data. Instead, one puts Azure API Management (APIM) in front to control the inbound traffic and add features like load balancing, throttling and more. APIM has OpenAI awareness so you can easily add the AI deployment we created as a backend to a separate API. More details on that can be found here:
GitHub - Azure-Samples/AI-Gateway: APIM OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services
I went with the policy for the "Token Rate Limiting lab", but feel free to experiment with the more complicated policies.
You probably will not be surprised by now that this is something I'm also adding through Aspire:
Code:
var apimai = builder.AddBicepTemplate(
name: "APIM",
bicepFile: "../infra/apim.bicep")
.WithParameter(AzureBicepResource.KnownParameters.KeyVaultName)
.WithParameter("apimResourceName", "apim")
.WithParameter("apimSku", "Basicv2")
.WithParameter("openAIAccountName", accountName);
var apimEndpoint = apimai.GetOutput("apimResourceGatewayURL");
var apimKey = apimai.GetSecretOutput("subscriptionKey");
builder.AddDockerfile("aoai-simulator-record", "../AOAI_API_Simulator")
.WithBindMount("recordings", "/app/.recording")
.WithHttpEndpoint(port: 8001, targetPort: oaiSimulatorPort)
.WithEnvironment("SIMULATOR_API_KEY", localOaiKey)
.WithEnvironment("SIMULATOR_MODE", "record")
.WithEnvironment("AZURE_OPENAI_ENDPOINT", apimEndpoint)
.WithEnvironment("AZURE_OPENAI_KEY", apimKey)
.WithEnvironment("AZURE_OPENAI_DEPLOYMENT", cloudDeployment)
.WithEnvironment("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", cloudDeployment)
.ExcludeFromManifest();
Arguably it creates a somewhat prolonged request pipeline, and it gets complicated if you apply different restrictions at each stage.
You may not want to use all of these components, but you can mix and match in the sense that the code doesn't need to have an opinion on what lies behind the endpoint.
While we're not deploying the app as such to Azure in any way clearly both Azure OpenAI and Azure APIM are things that you need to provision into a subscription / resource group. I think this creates an interesting paradigm. You have always had the ability to do things like creating a SQL database in Azure and connecting from a debugging session in Visual Studio, but it has been more common that you use LocalDB locally and swap out the connection string during deployment. Here we are able to create a configuration that uses Azure with seamless integration into the developer inner loop. The resources are intended for single dev use while at the same time being easy to share with a team if necessary. (Key Vault is almost free & Azure OpenAI doesn't have a cost unless you actively send tokens back and forth. APIM does have a cost though since policies cannot be used with the consumption SKU.)
The code so far hasn't gone into how these resources are created. The builder.AddBicepTemplate points to Bicep files that cares of this. We're not employing the Azure Developer CLI since we don't deploy the actual web app - Aspire can handle the rest on it's own. You will notice that there are values that need to be added to appsettings.json in a predefined pattern:
Code:
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning",
"Aspire.Hosting.Dcp": "Warning"
}
},
"Parameters": {
"TenantId": "guid",
"ClientId": "guid",
"ClientSecret": "secret"
},
"Azure": {
"SubscriptionId": "<Your subscription id>",
"AllowResourceGroupCreation": true,
"ResourceGroup": "<Valid resource group name>",
"Location": "<Valid Azure location>"
}
}
Code from me to you can be found here:
GitHub - ahelland/ai-dev-zone: Samples for using Azure OpenAI and an OpenAI simulator with .NET Aspire
You can step into the Azure Portal afterwars and play around if you like with the resources Aspire made for you, but your code is totally local (until you check-in to your repo and follow that flow). When done you can delete the resources from the Portal. Do note that the AI deployment must be deleted before the AI account. Key Vaults and APIM needs to be purged after deletion or a redeployment will fail.
At the end of the day - AI development is just "development"
Continue reading...