Jump to content

Calculating Chargebacks for Business Units/Projects Utilizing a Shared Azure OpenAI Instance

Featured Replies

Posted

Azure OpenAI Service is at the forefront of technological innovation, offering REST API access to OpenAI's suite of revolutionary language models, including GPT-4, GPT-35-Turbo, and the Embeddings model series.

 

 

 

Enhancing Throughput for Scale

 

As enterprises seek to deploy OpenAI's powerful language models across various business units, they often require granular control over configuration and performance metrics. To address this need, Azure OpenAI Service is introducing dedicated throughput, a feature that provides a dedicated connection to OpenAI models with guaranteed performance levels. Throughput is quantified in terms of tokens per second (tokens/sec), allowing organizations to precisely measure and optimize the performance for both prompts and completions. The model of provisioned throughput provides enhanced management and adaptability for varying workloads, guaranteeing system readiness for spikes in demand. This capability also ensures a uniform user experience and steady performance for applications that require real-time responses.

 

 

 

Resource Sharing and Chargeback Mechanisms

 

Large organizations frequently provision a singular instance of Azure OpenAI Service that is shared across multiple internal departments. This shared use necessitates an efficient mechanism for allocating costs to each business unit or consumer, based on the number of tokens consumed. This article delves into how chargeback is calculated for each business unit based on their token usage.

 

 

 

Leveraging Azure API Management Policies for Token Tracking

 

Azure API Management Policies offer a powerful solution for monitoring and logging the token consumption for each internal application. The process can be summarized in the following steps:

 

largevv2px999.png.1e564de41d0059169ed0b3b6713c95d5.png

 

 

 

1. Client Applications Authorizes to API Management

 

To make sure only legitimate clients can call the Azure OpenAI APIs, each client must first authenticate against Azure Active Directory and call APIM endpoint. In this scenario, the API Management service acts on behalf of the backend API, and the calling application requests access to the API Management instance. The scope of the access token is between the calling application and the API Management gateway. In API Management, configure a policy (validate-jwt or validate-azure-ad-token) to validate the token before the gateway passes the request to the backend.

 

 

 

2. APIM redirects the request to Backend Web App that interacts with OpenAI service via managed identity and private endpoint.

 

Upon successful verification of the token, Azure API Management (APIM) routes the request to a backend Web App. Azure OpenAI service can be configured to use managed identities. App service authenticates with Azure OpenAI service using managed identity and fetches a Chat Completion response, which includes counts for prompt and completion tokens, subsequently relaying this data back to Azure APIM.

 

 

 

 

 

ChatCompletionsOptions options = new()

{

Messages =

{

new(ChatRole.User, userPrompt)

},

User = sessionId,

MaxTokens = _openAiConfig.MaxTokens,

Temperature = _openAiConfig.Temperature,

NucleusSamplingFactor = _openAiConfig.NucleusSamplingFactor,

FrequencyPenalty = _openAiConfig.FrequencyPenalty,

PresencePenalty = _openAiConfig.PresencePenalty

};

 

var client = new OpenAIClient(new Uri(_openAiConfig.Endpoint), new DefaultAzureCredential());

Response<ChatCompletions> completionsResponse = await client.GetChatCompletionsAsync(_openAiConfig.ModelName, options);

ChatCompletions completions = completionsResponse.Value;

 

return new CompletionResponse

{

Response = completions.Choices[0].Message.Content,

PromptTokens = completions.Usage.PromptTokens,

ResponseTokens = completions.Usage.CompletionTokens,

TotalTokens = completions.Usage.TotalTokens,

};

 

 

 

 

 

3. Capture and log API response to Event Hub

 

Leveraging the log-to-eventhub policy to capture outgoing responses for logging or analytics purposes. To use this policy, a logger needs to be configured in the API Management:

 

 

 

 

 

 

 

# API Management service-specific details

$apimServiceName = "apim-hello-world"

$resourceGroupName = "myResourceGroup"

 

# Create logger

$context = New-AzApiManagementContext -ResourceGroupName $resourceGroupName -ServiceName $apimServiceName

New-AzApiManagementLogger -Context $context -LoggerId "ContosoLogger1" -Name "ApimEventHub" -ConnectionString "Endpoint=sb://<EventHubsNamespace>.servicebus.windows.net/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<key>" -Description "Event hub logger with connection string"

 

 

 

 

 

Within outbound policies section, pull specific data from the body of the response and send this information to the previously configured EventHub instance. This is not just a simple logging exercise; it is an entry point into a whole ecosystem of real-time analytics and monitoring capabilities:

 

 

 

 

 

<log-to-eventhub logger-id="OpenAiChargeBackLogger" partition-id="1">@{

 

var responseBody = context.Response.Body?.As<JObject>(true);

return new JObject(

new JProperty("timestamp", DateTime.UtcNow.ToString()),

new JProperty("appId", context.Request.Headers.GetValueOrDefault("Authorization",string.Empty).Split(' ').Last().AsJwt().Claims.GetValueOrDefault("appid", string.Empty)),

new JProperty("appSubscriptionKey", context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key",string.Empty)),

new JProperty("promptTokens", responseBody["promptTokens"].ToString()),

new JProperty("responseTokens", responseBody["responseTokens"].ToString()),

new JProperty("totalTokens", responseBody["totalTokens"].ToString())

).ToString();

}

</log-to-eventhub>

 

 

 

 

 

EventHub serves as a powerful fulcrum, offering seamless integration with a wide array of Azure and Microsoft services. For example, the logged data can be directly streamed to Azure Stream Analytics for real-time analytics or to Power BI for real-time dashboards With Azure Event Grid, the same data can also be used to trigger workflows or automate tasks based on specific conditions met in the incoming responses.

 

Moreover, the architecture is extensible to non-Microsoft services as well. Event Hubs can interact smoothly with external platforms like Apache Spark, allowing you to perform data transformations or feed machine learning models.

 

 

 

4: Data Processing with Azure Functions

 

An Azure Function is invoked when data is sent to the EventHub instance, allowing for bespoke data processing in line with your organization’s unique requirements. For instance, this could range from dispatching the data to Azure Monitor, streaming it to Power BI dashboards, or even sending detailed consumption reports via Azure Communication Service.

 

 

 

 

 

[FunctionName("ChargeBackFunction")]

public static async Task Run(

[EventHubTrigger("openai-chargeback-hub", Connection = "EventHubConnectionString")] EventData[] eventHubMessages,

ILogger logger)

{

logger.LogInformation("Chargeback function triggered");

try

{

foreach (var message in eventHubMessages)

{

string messageBody = Encoding.UTF8.GetString(message.EventBody.ToArray());

 

logger.LogInformation($"Chargeback Data {messageBody}");

 

var telemetryConfiguration = new TelemetryConfiguration

{

ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING")

};

var telemetryClient = new TelemetryClient(telemetryConfiguration);

telemetryClient.TrackEvent("Function called with Chargeback Data", JsonConvert.DeserializeObject<Dictionary<string, string>>(messageBody));

telemetryClient.Flush();

 

await Task.FromResult(true);

 

}

}

catch (Exception ex)

{

logger.LogError($"Something went wrong. Exception thrown: {ex.Message}");

}

}

 

 

 

 

 

In the example above, Azure function processes the tokens response data in Event Hub and sends them to Application Insights telemetry, and a basic Dashboard is configured in Azure, displaying the token consumption for each client application. This information can conveniently be used to compute chargeback costs.

 

largevv2px999.png.739608db5707b106fa9a5b9cf361bde4.png

 

A sample query used in dashboard above that fetches all the tokens consumed by a specific client:

 

 

 

 

 

customEvents

| where name contains "Function called with Chargeback Data"

| extend data = parse_json(customDimensions)

| where data.appId contains "<CLIENT_APP_ID>"

| project

timestamp = data.timestamp,

PromptTokens = data.promptTokens,

ResponseTokens = data.responseTokens,

TotalTokens = data.totalTokens

 

 

 

 

 

Azure OpenAI Landing Zone reference architecture

 

A crucial detail to ensure the effectiveness of this approach is to secure the Azure OpenAI service by implementing Private Endpoints and using Managed Identities for App Service to authorize access to Azure AI services. This will limit access so that only the App Service can communicate with the Azure OpenAI service. Failing to do this would render the solution ineffective, as individuals could bypass the APIM/App Service and directly access the OpenAI Service if they get hold of the access key for OpenAI. Refer to Azure OpenAI Landing Zone reference architecture to build a secure and scalable AI environment.

 

 

 

Additional Considerations

 

  • A subscription key or a custom header like app-key can also be used to uniquely identify the client as appId in OAuth token is not very intuitive.
  • Rate-limiting can be implemented for incoming requests using OAuth tokens or Subscription Keys, adding another layer of security and resource management.
  • The solution can also be extended to redirect different clients to different Azure OpenAI instances. For example., some clients utilize an Azure OpenAI instance with default quotas, whereas premium clients get to consume Azure Open AI instance with dedicated throughput.

 

Conclusion

 

Azure OpenAI Service stands as an indispensable tool for organizations seeking to harness the immense power of language models. With the feature of provisioned throughput, clients can define their usage limits in throughput units and freely allocate these to the OpenAI model of their choice. However, the financial commitment can be significant and is dependent on factors like the chosen model's type, size, and utilization. An effective chargeback system offers several advantages, such as heightened accountability, transparent costing, and judicious use of resources within the organization.

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...