Guardrails help you control your AI’s behavior by preventing unwanted responses. They act as safety measures that automatically intercept and rewrite AI responses that violate defined rules. Common use cases include:
Name: Enter a short, descriptive name (e.g., “No Discounts”)
Violation Definition: Write a clear description of what constitutes a violation
Examples: Add sample messages that would trigger the guardrail
For a discount prevention guardrail, you might configure it like this:Name: DiscountsViolation Definition: AI provides discounts or promises of discountsExamples:
User: “Can you give me a discount?”
User: “Can I get $5 off?”
AI: “Sorry about all the trouble. I’m happy to give you a $5 off promotion.”
Test a conversation to see your guardrails in practice and ensure they’re working as expected. Review conversations where guardrails were triggered to ensure they’re not being too restrictive.
When a guardrail is triggered, there is an indicator. Clicking on the indicator shows the exact message that triggered the guardrail, and what it was rewritten to.
You can monitor guardrail performance in the Metrics > Guardrails page. Here you can see all guardrail violations across all conversations, with details on the exact message that triggered the guardrail, and what it was rewritten to.