Guardrails

By Paul Brock·Updated on 24-04-2026

TL;DR

Guardrails are the safety measures around an LLM — input filters, output validation and monitoring — that prevent the model from producing harmful, inaccurate or unintended output.

Guardrails work in layers: input (filter sensitive questions, PII redaction, prompt-injection detection), output (validate JSON, filter profanity, block PII leaks) and runtime (rate limiting, logging, escalation paths). Frameworks like NeMo Guardrails, Guardrails AI and LangChain guardrails provide building blocks. Critical for customer-facing chatbots, medical and legal applications.

Example

An HR chatbot gets an input guardrail: every question with 'salary', 'firing' or 'complaint' is not answered directly but referred to an HR employee. Output guardrail: every response is scanned for PII (BSN, phone, email) before sending.

Frequently asked questions

Are guardrails 100% airtight?

No. Certain prompt injections and creative jailbreaks can bypass guardrails. Layer defences, monitor actively, plan human escalation for edge cases.

Guardrails in the prompt or external?

Both. System prompt sets baseline behaviour; external guardrails (rules, filters, classifiers) add a deterministic layer. External guardrails are more robust against injection.

Guardrails

Example

Frequently asked questions

Related terms

Further reading

Need help with SEO or GEO?