Guardrails
Guardrails are the safety measures around an LLM — input filters, output validation and monitoring — that prevent the model from producing harmful, inaccurate or unintended output.
Guardrails work in layers: input (filter sensitive questions, PII redaction, prompt-injection detection), output (validate JSON, filter profanity, block PII leaks) and runtime (rate limiting, logging, escalation paths). Frameworks like NeMo Guardrails, Guardrails AI and LangChain guardrails provide building blocks. Critical for customer-facing chatbots, medical and legal applications.
Example
An HR chatbot gets an input guardrail: every question with 'salary', 'firing' or 'complaint' is not answered directly but referred to an HR employee. Output guardrail: every response is scanned for PII (BSN, phone, email) before sending.
Frequently asked questions
Are guardrails 100% airtight?
No. Certain prompt injections and creative jailbreaks can bypass guardrails. Layer defences, monitor actively, plan human escalation for edge cases.
Guardrails in the prompt or external?
Both. System prompt sets baseline behaviour; external guardrails (rules, filters, classifiers) add a deterministic layer. External guardrails are more robust against injection.
Related terms
Further reading
- → Our service: GEO