Skip to main content
Evaluators score your LLM outputs — automatically with an LLM judge, programmatically with code, or manually with human review. Create them on Respan, then trigger from code, the gateway, or experiments.
This is a beta feature. The API documentation is the source of truth for evaluator configuration and behavior.

Set up an evaluator

Go to Evaluators and click + New evaluator. Select the evaluator type:
LLM evaluators use a language model to score outputs automatically.
1

Configure the evaluator

Define a Slug — a unique identifier used to reference this evaluator in API calls and logs.
Don’t change the slug after creation. It is used to identify the evaluator across your logs and code.
Choose a model for the evaluator. Currently supported: gpt-4o and gpt-4o-mini (OpenAI and Azure OpenAI).
2

Write the definition

The definition is the core instruction that tells the LLM how to evaluate. You can use these variables:
VariableDescription
{{input}}The input prompt sent to the LLM
{{output}}The response generated by the LLM
{{metadata}}Custom metadata associated with the request
{{metrics}}System-captured metrics (latency, tokens, etc.)
Ideal output: ideal_output is not a standalone variable. To compare against a reference answer, include it in your metadata and reference it as {{metadata.ideal_output}}.
3

Define the scoring rubric

Set the scoring rubric to guide how the LLM assigns scores. Set a passing score (minimum score for a response to pass).
Click Save to create the evaluator.