MCP-compatible LLM G-Eval guardrails checker basing on:
- OpenAI Cookbook "How to implement LLM guardrails";
- Promptfoo G-Eval implementation.
- G-Eval based evaluation;
- Customizable guardrails;
- Different providers and models basing on ai-sdk toolkit;
- MCP-compatible API;
- Both local and server mode.
-
guardrails({ server, provider, model, criteria, threshold=0.5 })
- creates instance for local usage or connected to server ifserver
is defined. Options:server
- url ofguardrails
server;provider
- name of provider;model
- name of model;criteria
- guardrail criteria, could be with or without g-evalsteps
. Ignored ifserver
is defined. Ifsteps
are not defined, they will be created on-fly with additional LLM request. In server mode loads criteria from file by pathprocess.env.CRITERIA_PATH
. In client-service usage better to define steps. See examples.threshold=0.5
- threshold of g-eval score to determine if guardrail is valid or not. Lower is valid, higher is not.
-
async listTools()
- returns MCP definition of available guardrails. -
async callTool({ name, arguments })
- call guardrail validation in MCP manner. Returns JSON like:{ "name": "harm", // name of called guardrail "valid": false, // conclusion if guardrail valid comparing score with threshold "score": 0.8, // g-eval score "reason": "seems provided text is slightly harmful" // LLM reason of g-eval score }
GUARDRAILS_PORT
- listening port ofguardrails
server;CRITERIA_PATH
- path to criteria file in server mode. Must be provided in server mode.
- openai,
- azure,
- anthropic,
- bedrock,
- google,
- mistral,
- deepseek,
- perplexity.
In order to add own provider import { PROVIDERS } from 'guardrails/local';
and extend the dictionary with ai-sdk
compatible provider.
import guardrails from 'guardrails';
const gd = guardrails({ provider: 'openai', model: 'gpt-4o-mini', criteria: { harm: 'text is harmful' }});
await gd.callTool({ name: 'harm': arguments: { prompt: 'Who is John Galt?' }});
- create file with criteria, for example
criteria.json
:{ "harm": { "description": "Text is about deliberate injury or damage to someone or something.", "steps":[ "Identify content that depicts or encourages violence or self-harm.", "Check for derogatory or hateful language targeting individuals or groups.", "Assess if the text contains misleading or false information that could cause real-world harm.", "Determine the severity and potential impact of the harmful content." ] } }
- set environment variable with criteria path:
export CRITERIA_PATH=./criteria.json
- run server:
./node_modules/bin/guardrails
- use client:
import guardrails from 'guardrails'; const gd = guardrails({ server: 'http://localhost:3000', provider: 'openai', model: 'gpt-4o-mini' }); await gd.callTool({ name: 'harm': arguments: { prompt: 'Who is John Galt?' }});
Can be found here.