Keeping your AI out of trouble

MarcoCardoso · Jun 12, 2024

One thing is true for most AI Applications - it's easy to get yourself in trouble if you're not careful. AI is all about probability, and the probability of it being incorrect, or behaving unexpectedly for a new input is practically never zero. In the classic chatbot days, this often meant getting an answer about something you're not asking about, or the good old "I did not understand" default answer we all "love" to see when we're having an issue. But with Generative AI, mistakes are much more nuanced, and may take the appearance of plain misinformation and, even worse, harmful content!

In this article, we'll cover some of the guidelines you can adopt to minimize risk on AI Apps. Each section is composed of a set of actions you can take, followed by good and bad examples to illustrate their role in keeping your users - and you! - safe from unexpected AI behavior.

1. User interface guidelines

Starting with UI tips - these are simple changes to the way your end-users engage with your AI application that can go a long way in preventing misuse.

Guideline	Description	Reasons
Include disclaimer text	In order to interact with the AI, end-users should acknowledge the rules and limitations of the tool. A good disclaimer should mention: The information provided may be generated by AI The information provided may be incorrect The user is responsible for verifying the correctness of information against sources provided Any additional industry specific disclaimers	Users expect to see correct information on the platforms you provide them. The concept of a tool that can provide incorrect information is new and needs to be explicitly called out.
Visually separate Generated and Retrieved content into sections	Generated content is the output of the language model, and as such can be incorrect Retrieved content is directly extracted from trusted sources, and can be expected to be correct, but possibly not relevant This distinction should be clear to the end user. The generated content can be grounded on retrieved content, but you should always provide an original source the user can read directly. In addition, you may want to refrain from answering a question when no content was retrieved.	Once you establish some content must be verified by the user, you need to define a clear boundary of what information needs verification, and what can be trusted without doubt. Providing both pieces of information side by side makes it easy for the user to check the information at a glance, without leaving the app. Having that separation in the application also allows you to override the generated content. Even if the AI says something, you can choose not to display it through app logic if there are no sources to support it.
Add a feature to report issues and provide feedback	Users should be able to provide feedback whenever they face issues or receive unexpected responses. If you decide to let users include chat history with their feedback, make sure to get confirmation that no personal or sensitive data was shared.	Feedback forms provide a simple way for users to tell you if the app is meeting expectations.
Establish user accountability	Inform the user that the content they submit may be subject to review when harmful content is detected.	Having users be accountable for exploiting the tool may dissuade them from repeatedly attempting to do so.

Good examples

Let's start with the original ChatGPT interface - Notice all elements are present:

Disclaimer text at the bottom
Per-message feedback option
Clearly distinct Retrieval and Generation sections
Terms and Conditions - though hidden under the question mark on the bottom right.

All these elements are crucial to ensure the user is aware how things can go wrong, and sets the right expectations for how to use the tool.

Microsoft Copilot for M365 has its disclaimer and all links right below the logo. Straight to the point!

Don't worry about writing a huge disclaimer that contains everything - you can link the full terms and keep a clean UI.

Bad examples

Common mistakes when setting up a UI include:

Not having the required disclaimers, sources or highlighting
Overstating the chatbot's usefulness - e.g. "can help with anything about [topic]"

While some of these safeguards may seem like they are understating the chatbot's usefulness, they are indispensable to setting the right expectations given the inherent limitations of the technology.

2. System message guidelines

Next, we have system message guidelines. These are instructions that are not visible to the user, but guide the chatbot to answer questions with the right focus or style. Keep in mind that these can be somewhat overridden by user prompts, and as such only prevent accidental or simple misuse.

Guideline	Description	Reasons
Define a clear scope of what the chatbot should assist with	The assistant should not attempt to help with all requests. Establish a clear boundary as to what conversations it should engage in. For all other topics, it should politely decline to engage.	Failing to specify a scope will make the bot behave as a generic utility, like out-of-the-box ChatGPT. Users may take advantage of that fact to misuse the application or API.
Do not personify the chatbot	The chatbot should present itself as a tool to help the user navigate content, rather than a person. Behaving as an employee or extension of the company should also be avoided.	When users make improper use a personified chatbot, it may give the impression of manipulation/gullibility, rather than simple misuse.

Good example

"You are a search engine for Contoso Technology. Your role is to assist customers in locating the right information from publicly available sources like the website. Politely decline to engage in conversations about any topic outside of Contoso Technology"

Bad example

"You are Contoso's AI Assistant. You are a highly skilled customer service agent that can help users of the website with all their questions."

3. Evaluation guidelines

Next, we have evaluation guidelines. These tools will help quantitatively measure the correctness of responses - and the possibility of manipulating the app into generating harmful content.

Guideline	Description	Reasons
Evaluate the chatbot's accuracy, and other metrics for quality of information	Define a set of "critical" questions your chatbot should be able to answer reliably. Regularly submit this dataset for inference and either manually or automatically evaluate its accuracy. Prefer a combination of manual and automatic validations to ensure best results.	As chatbots evolve to meet your customer's expectations, it's common to lose track of answers which it supposedly already knows. Updating the prompt or data sources may negatively impact those responses, and these regressions need to be properly tracked.
Evaluate the chatbot's ability to avoid generating harmful content	Define a set of "red-team" requests that attempt to break the chatbot, force it to generate harmful content, or leave its scope. As with accuracy, establish a regular re-submission of this dataset for inference.	Unfortunately, chatbots can always be misused by an ill-intended user. Keep track of the most common "jailbreaking" patterns and test your bot's behavior against them. Azure OpenAI comes with built-in content safety, but it’s not foolproof. Make sure you objectively measure harmful content generation.

Good examples

Leveraging Azure AI Studio to evaluate Groundedness, Relevance, Coherence, Fluency and Similarity. More information can be found in the docs!
Using Prompt Shields for Jailbreak and Harmful Content detection.

Bad examples

Trying to capture exact matches when evaluating accuracy.
Not considering evaluation as part of the release cycle.

4. Data privacy guidelines

Finally, we cover some data privacy guidelines. Data privacy is about how you receive, process, persist and discard end-user information through your applications. Be aware that this is an overview and does not cover every aspect of data privacy, but is a good place to start considering privacy concerns.

Guideline	Description	Reasons
Don't audit all model inputs and outputs unless absolutely necessary	There is typically no need to log all user interactions. Even when instructed not to, users may submit personal information which is then at risk of exposure. Debugging and monitoring tools should focus on response status codes and token counts, rather than actual text content.	Persisting messages often poses a more severe data privacy risk than simply not doing so. Microsoft only ever persists messages which are suspected of breaking terms and conditions. They may be then viewed by Microsoft for the sole purpose of evaluating improper use. Review with your Data Privacy team if you require this feature to be turned off.

Good examples

Capturing HTTP response codes and error messages for debugging.
Logging token usage related metrics to Azure Application Insights.
Capturing user intent for continuous improvement.
Expiring user conversation logs and metrics once they are no longer relevant for the purpose of providing the experience, as disclosed in its Privacy Statement

Bad examples

Capturing verbatim prompt / completion pairs.
Persisting user information for longer than necessary.
Failing to adhere to the Privacy Statement.

Wrap up

Remember, AI misuse will happen in your applications. Your objective is to safeguard your legitimate users so they know what the applications can and cannot do, while giving ill-intended users an experience that gives less the impression of a failed / fragile tool, and more like a robust toolset being used incorrectly.

We hope this cheat sheet provides a good overview of the tools available in Azure to help bring safety and responsibility to the use of AI. Do you have other tips or tools to safeguard AI Applications? Let us know in the comments!

Continue reading...

Keeping your AI out of trouble

MarcoCardoso

1. User interface guidelines​

2. System message guidelines​

​

​

3. Evaluation guidelines​

​

4. Data privacy guidelines​

​

Wrap up​

1. User interface guidelines

2. System message guidelines

3. Evaluation guidelines

4. Data privacy guidelines

Wrap up