Guardrails - Interact

Guardrails let you define rules that run automatically before or after the agent responds. They help you keep conversations on-topic, protect sensitive data, and prevent misuse — without any manual moderation.

How guardrails work

Guardrails run in two stages:

Before-agent checks run on the user’s message before the agent sees it. Use these to block or filter inputs.
After-agent checks run on the agent’s response before it’s shown. Use these to catch problems in the output.

When a guardrail triggers, you can choose what happens: block the message entirely, redact the sensitive part, or let it through with a warning logged.

Before-agent guardrails

PII detection & masking

Automatically detects personally identifiable information in the user’s message and handles it before the agent processes it. Detection patterns — choose which types of PII to look for:

Pattern	Examples
Email addresses	`jan@example.com`
Phone numbers	`+31 6 12345678`
Credit card numbers	`4111 1111 1111 1111`

Action — what to do when PII is detected:

Action	Behaviour
Redact	Replaces the PII with `[REDACTED]` before passing the message to the agent. The user’s original message is unchanged in the chat UI, but the agent never sees the actual value.
Block	Rejects the entire message and returns an error to the user.

Off-topic detection

Checks whether the user’s question falls within the topics you want this agent to handle. If the message is off-topic, the agent won’t run — it returns a message telling the user what the agent is for. When you add this guardrail, you define the allowed topics. These are plain-language descriptions, not rigid keywords: Example allowed topics:

Business data analysis
Campaign performance
Budget tracking
Google Ads metrics

Be broad enough to cover natural variations in how users phrase questions. “Campaign performance” will match “how did my campaigns do?” even though the exact words don’t match.

Jailbreak detection

Automatically detects attempts to override the agent’s instructions or make it behave outside its intended purpose (e.g. “ignore your system prompt and…”). No configuration needed — enable it and it runs automatically.

Harmful content moderation

Checks for harmful, abusive, or unsafe content in the user’s message. Automatically blocks requests that contain hate speech, threats, or other harmful language. No configuration needed.

After-agent guardrails

PII detection & masking (output)

The same PII detection applies to the agent’s response before it’s shown to the user. Useful when your data contains personal information that might appear in query results — for example, if a table includes customer email addresses that could be returned by a query.

Hallucination detection

Checks the agent’s response for claims that don’t appear to be grounded in the data it retrieved. If the agent makes a statement that can’t be verified from the query results, this guardrail flags or blocks the response. No configuration needed — enable it and it runs on every response automatically.

Custom guardrails

Custom guardrails let you define your own checks using plain language. You write a prompt that describes the rule, and the AI evaluates each message or response against it. Each custom guardrail has:

Field	Description
Name	A label for this check, shown in logs (e.g. “Competitor mention check”)
Prompt	A plain-language description of what to check for (e.g. “Does this message mention any competitor brand names?”)
Action	What to do if the check triggers: warn (log only), block (reject the message), or redact (remove the flagged content)

Custom guardrails can be added to either the before-agent or after-agent stage. Example custom guardrail:

Name: Budget guardrail
Prompt: Does this response recommend increasing any budget by more than 50% compared to current spend?
Action: Warn

Custom guardrails use an AI call to evaluate each message, which adds a small amount of processing time and cost per request. Use them for checks that matter most — not as a catch-all.

​How guardrails work

​Before-agent guardrails

​PII detection & masking

​Off-topic detection

​Jailbreak detection

​Harmful content moderation

​After-agent guardrails

​PII detection & masking (output)

​Hallucination detection

​Custom guardrails

How guardrails work

Before-agent guardrails

PII detection & masking

Off-topic detection

Jailbreak detection

Harmful content moderation

After-agent guardrails

PII detection & masking (output)

Hallucination detection

Custom guardrails