General availability of Prompt Shields in Azure AI Content Safety and Azure OpenAI Service

Sherry_Shao · Sep 3, 2024

Today we are announcing the general availability of Prompt Shields in Azure AI Content Safety and Azure OpenAI Service, a robust AI security feature we announced in preview in March 2024.

Prompt Shields seamlessly integrate with Azure OpenAI Service content filters and are available in Azure AI Content Safety, providing a robust defense against different types of prompt injection attacks. By leveraging advanced machine learning algorithms and natural language processing, Prompt Shields effectively identify and mitigate potential threats in user prompts and third-party data. This cutting-edge capability will support the security and integrity of your AI applications, safeguarding your systems against malicious attempts at manipulation or exploitation.

Key Features

Prompt Shields for Direct Attacks: Previously called Jailbreak risk detection, this shield targets direct prompt injection attacks, where users deliberately exploit system vulnerabilities to elicit unauthorized behavior from the LLM. This could lead to inappropriate content generation or violations of system-imposed restrictions.
Prompt Shields for Indirect Attacks: This shield aims to safeguard against attacks that use information not directly supplied by the user or developer, such as external documents. Attackers might embed hidden instructions in these materials in order to gain unauthorized control over the LLM session.

Prompt Shields API: Input and Output

Prompt Shields in Azure OpenAI Service

User Scenarios

"Prompt Shields" in Azure AI Content Safety are specifically designed to safeguard generative AI systems from generating harmful or inappropriate content. These shields detect and mitigate risks associated with both User Prompt Attacks (malicious or harmful user-generated inputs) and Document Attacks (inputs containing harmful content embedded within documents). The use of "Prompt Shields" is crucial in environments where GenAI is employed, ensuring that AI outputs remain safe, compliant, and trustworthy.

The primary objectives of the "Prompt Shields" feature for GenAI applications are:

To detect and block harmful or policy-violating user prompts (direct attacks) that could lead to unsafe AI outputs.
To identify and mitigate indirect attacks where harmful content is embedded within user-provided documents.
To maintain the integrity, safety, and compliance of AI-generated content, thereby preventing misuse of GenAI systems.

Use Case Examples

AI Content Creation Platforms: Detecting Harmful Prompts

Scenario: An AI content creation platform uses generative AI models to produce marketing copy, social media posts, and articles based on user-provided prompts. To prevent the generation of harmful or inappropriate content, the platform integrates "Prompt Shields."
User: Content creators, platform administrators, and compliance officers.
Action: The platform uses Azure AI Content Safety's "Prompt Shields" to analyze user prompts before generating content. If a prompt is detected as encouraging the creation of potentially harmful or is likely to lead to policy-violating outputs (e.g., prompts asking for defamatory content or hate speech), the shield blocks the prompt and alerts the user to modify their input.
Outcome: The platform ensures all AI-generated content is safe, ethical, and compliant with community guidelines, enhancing user trust and protecting the platform's reputation.

AI-Powered Chatbots: Mitigating Risk from User Prompt Attacks

Scenario: A customer service provider uses AI-powered chatbots for automated support. To safeguard against user prompts that could manipulate the model to generate inappropriate or unsafe responses, the provider uses "Prompt Shields."
User: Customer service agents, chatbot developers, and compliance teams.
Action: The chatbot system integrates "Prompt Shields" to monitor and evaluate user inputs in real-time. If a user prompt is identified as trying to exploit the AI (e.g., attempting to provoke inappropriate responses or extract sensitive information), the shield intervenes by blocking the response or redirecting the query to a human agent.
Outcome: The customer service provider maintains high standards of interaction safety and compliance, preventing the chatbot from generating responses that could harm users or breach policies.

E-Learning Platforms: Preventing Inappropriate AI-Generated Educational Content

Scenario: An e-learning platform employs GenAI to generate personalized educational content based on student inputs and reference documents. To avoid security threats that could cause the generation of inappropriate or misleading educational, the platform utilizes "Prompt Shields."
User: Educators, content developers, and compliance officers.
Action: The platform uses "Prompt Shields" to analyze both user prompts and connected data sources, such as documents, for content that could manipulate the application to generate unsafe or policy-violating AI outputs. If a user prompt or connected document is detected as likely to generate inappropriate educational content, the shield blocks it and suggests alternative, safe inputs.
Outcome: The platform ensures that all AI-generated educational materials are appropriate and compliant with academic standards, fostering a safe and effective learning environment.

With the general availability of Prompt Shields, AI systems can now be more effectively protected against prompt injection attacks, supporting robust security.

Our Customers

AXA joined the surge of generative AI, providing its 140,000 employees the power of this new era of AI with an emphasis on doing things securely and responsibly.

“Our goal was to ensure that all our employees had access to the same technology as the public AI tools within a secure and reliable environment. We wanted to go fast, and that's why we did it in less than three months”

-Vincent De Ponthaud, Head of Software & AI Engineering at AXA.

AXA can leverages Azure OpenAI Services’s content filters, using Prompt Shields to add a security layer to their application to prevent any jailbreaking of the model and ensure an optimal level of reliability.

Read more about how AXA is leveraging Prompt Shields AXA Secure GPT.

Resources

For more information on Prompt Shields visit the Azure AI Content Safety documentation
For more information about using built-in Prompt Shields content filter visit the Azure OpenAI Service documentation
For API input limits, see the Input requirements section of the Overview.
Read our scientific research paper on Spotlighting: [2403.14720] Defending Against Indirect Prompt Injection Attacks With Spotlighting
Microsoft Developer Prompt Shields overview video

General availability of Prompt Shields in Azure AI Content Safety and Azure OpenAI Service

Sherry_Shao