Run evaluator | Respan Docs

Executes an evaluator against provided input/output data for testing purposes. This endpoint allows you to test your evaluator configuration before using it in production.

Authentication

All endpoints require API key authentication:

$ Authorization: Bearer YOUR_API_KEY

Path Parameters

Parameter	Type	Description
`evaluator_id`	string	The unique ID of the evaluator to run

Unified Evaluator Inputs

All evaluator runs now receive a single unified inputs object. This applies to all evaluator types (llm, human, code). The same fields are also recorded and visible on the Scores page for every evaluation.

Request Body Structure

1 {
2   "inputs": {
3     "input": {},
4     "output": {},
5     "metrics": {},
6     "metadata": {},
7     "llm_input": "",
8     "llm_output": ""
9   }
10 }

Field Descriptions

Field	Type	Required	Description
`inputs`	object	Yes	The unified input object containing all evaluation data
`inputs.input`	any JSON	Yes	The request/input to be evaluated
`inputs.output`	any JSON	Yes	The response/output being evaluated
`inputs.metrics`	object	No	System-captured metrics (e.g., tokens, latency, cost)
`inputs.metadata`	object	No	Context and custom properties you pass; also logged
`inputs.llm_input`	string	No	Legacy convenience alias for input (maps to unified fields)
`inputs.llm_output`	string	No	Legacy convenience alias for output (maps to unified fields)
`generation_method`	string	No	Controls which automation method to use: `"auto"` (default), `"llm"`, `"code"`

Generation Method Options

New Feature: The generation_method parameter allows you to control which automation is used, since evaluators can now have both LLM and code configs.

"auto" (default): Automatically selects the best available automation method in order: LLM → Code → Legacy config
"llm": Force use of LLM-based evaluation (requires llm_config to be configured)
"code": Force use of code-based evaluation (requires code_config to be configured)

Note: Human scoring is done through the UI/Scores API, not via this test/run endpoint.

Notes:

These fields are stored with each evaluation and shown in the Scores page alongside the resulting score
When running evaluators from LLM calls, inputs is auto-populated from the request/response and tracing data
Legacy {{llm_input}}/{{llm_output}} placeholders remain supported and transparently map to the unified fields
New templates should reference {{input}} and {{output}}

Examples

Test LLM Evaluator

Python

1 evaluator_id = "0f4325f9-55ef-4c20-8abe-376694419947"
2 url = f"https://api.respan.ai/api/evaluators/{evaluator_id}/run/"
3 headers = {
4     "Authorization": "Bearer YOUR_API_KEY",
5     "Content-Type": "application/json"
6 }
7 
8 data = {
9     "inputs": {
10         "input": "What is the capital of France?",
11         "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
12         "metadata": {
13             "context": "Geography question about European capitals",
14             "user_id": "user_123",
15             "session_id": "session_456"
16         },
17         "metrics": {
18             "total_request_tokens": 23,
19             "total_response_tokens": 45,
20             "latency": 0.85,
21             "cost": 0.0012
22         }
23     },
24     "generation_method": "llm"  # Force LLM evaluation
25 }
26 
27 response = requests.post(url, headers=headers, json=data)
28 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": "What is the capital of France?",
>       "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
>       "metadata": {
>         "context": "Geography question about European capitals",
>         "user_id": "user_123"
>       },
>       "metrics": {
>         "total_request_tokens": 23,
>         "total_response_tokens": 45,
>         "latency": 0.85
>       }
>     },
>     "generation_method": "llm"
>   }'

Test Code Evaluator

Python

1 # Test boolean code evaluator
2 code_evaluator_id = "bool-eval-456"
3 url = f"https://api.respan.ai/api/evaluators/{code_evaluator_id}/run/"
4 
5 data = {
6     "inputs": {
7         "input": "Write a brief explanation of photosynthesis.",
8         "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
9         "metadata": {
10             "topic": "biology",
11             "difficulty": "basic"
12         }
13     }
14 }
15 
16 response = requests.post(url, headers=headers, json=data)
17 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/bool-eval-456/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": "Write a brief explanation of photosynthesis.",
>       "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
>       "metadata": {
>         "topic": "biology",
>         "difficulty": "basic"
>       }
>     }
>   }'

Test Human Categorical Evaluator

Python

1 # Test categorical evaluator
2 categorical_evaluator_id = "cat-eval-123"
3 url = f"https://api.respan.ai/api/evaluators/{categorical_evaluator_id}/run/"
4 
5 data = {
6     "inputs": {
7         "input": {
8             "question": "Explain the benefits of renewable energy",
9             "context": "Environmental science discussion"
10         },
11         "output": {
12             "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
13             "confidence": 0.95
14         },
15         "metadata": {
16             "evaluator_notes": "Well-structured response covering key points",
17             "evaluation_criteria": ["accuracy", "completeness", "clarity"]
18         }
19     }
20 }
21 
22 response = requests.post(url, headers=headers, json=data)
23 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/cat-eval-123/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": {
>         "question": "Explain the benefits of renewable energy",
>         "context": "Environmental science discussion"
>       },
>       "output": {
>         "response": "Renewable energy sources offer numerous benefits including reduced carbon emissions and cost savings.",
>         "confidence": 0.95
>       },
>       "metadata": {
>         "evaluator_notes": "Well-structured response"
>       }
>     }
>   }'

Legacy Format Support

Python

1 # Legacy format still supported for backward compatibility
2 data = {
3     "inputs": {
4         "llm_input": "What is the capital of France?",
5         "llm_output": "The capital of France is Paris.",
6         "metadata": {
7             "note": "Using legacy field names"
8         }
9     }
10 }
11 
12 response = requests.post(url, headers=headers, json=data)
13 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "llm_input": "What is the capital of France?",
>       "llm_output": "The capital of France is Paris."
>     }
>   }'

Response

LLM Evaluator Response

Status: 200 OK

1 {
2   "score": 4.5,
3   "score_type": "numerical",
4   "evaluator_id": "0f4325f9-55ef-4c20-8abe-376694419947",
5   "evaluator_name": "Response Quality Evaluator",
6   "evaluation_result": {
7     "reasoning": "The response is accurate and provides good detail about Paris, including its location and notable landmarks. The answer is complete and well-structured.",
8     "score": 4.5,
9     "passed": true
10   },
11   "inputs": {
12     "input": "What is the capital of France?",
13     "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
14     "metadata": {
15       "context": "Geography question about European capitals",
16       "user_id": "user_123",
17       "session_id": "session_456"
18     },
19     "metrics": {
20       "total_request_tokens": 23,
21       "total_response_tokens": 45,
22       "latency": 0.85,
23       "cost": 0.0012
24     }
25   },
26   "execution_time": 1.23,
27   "timestamp": "2025-09-11T10:30:45.123456Z"
28 }

Code Evaluator Response

1 {
2   "score": true,
3   "score_type": "boolean",
4   "evaluator_id": "bool-eval-456",
5   "evaluator_name": "Response Length Checker",
6   "evaluation_result": {
7     "result": true,
8     "details": "Response meets minimum length requirement (15 words >= 10 words)"
9   },
10   "inputs": {
11     "input": "Write a brief explanation of photosynthesis.",
12     "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
13     "metadata": {
14       "topic": "biology",
15       "difficulty": "basic"
16     }
17   },
18   "execution_time": 0.05,
19   "timestamp": "2025-09-11T10:30:45.123456Z"
20 }

Human Categorical Evaluator Response

1 {
2   "score": ["Good"],
3   "score_type": "categorical",
4   "evaluator_id": "cat-eval-123",
5   "evaluator_name": "Content Quality Assessment",
6   "evaluation_result": {
7     "selected_choices": ["Good"],
8     "choice_values": [4],
9     "note": "This evaluator requires human annotation. The response structure is validated but no actual evaluation is performed."
10   },
11   "inputs": {
12     "input": {
13       "question": "Explain the benefits of renewable energy",
14       "context": "Environmental science discussion"
15     },
16     "output": {
17       "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
18       "confidence": 0.95
19     },
20     "metadata": {
21       "evaluator_notes": "Well-structured response covering key points",
22       "evaluation_criteria": ["accuracy", "completeness", "clarity"]
23     }
24   },
25   "execution_time": 0.02,
26   "timestamp": "2025-09-11T10:30:45.123456Z"
27 }

Response Fields

Field	Type	Description
`score`	varies	The evaluation score (type depends on evaluator’s `score_value_type`)
`score_type`	string	The type of score: `numerical`, `boolean`, `categorical`, or `comment`
`evaluator_id`	string	ID of the evaluator that was run
`evaluator_name`	string	Name of the evaluator that was run
`evaluation_result`	object	Detailed evaluation results and reasoning
`inputs`	object	The input data that was evaluated (echoed back)
`execution_time`	number	Time taken to execute the evaluation (in seconds)
`timestamp`	string	ISO timestamp of when the evaluation was performed

Score Types by Evaluator

Numerical Evaluators

Score: Number (e.g., 4.5, 8.2)
Range: Defined by evaluator’s min_score and max_score
Passing: Determined by passing_score threshold

Boolean Evaluators

Score: Boolean (true or false)
Passing: true = passed, false = failed

Categorical Evaluators

Score: Array of selected category names (e.g., ["Good", "Accurate"])
Values: Corresponding numeric values from categorical_choices
Note: Human evaluators return placeholder values for testing

Comment Evaluators

Score: String with detailed feedback
Content: Varies based on evaluator configuration
Length: Can be extensive for detailed feedback

Error Responses

400 Bad Request

1 {
2   "detail": "Invalid input format: 'inputs' field is required"
3 }

401 Unauthorized

1 {
2   "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3 }

404 Not Found

1 {
2   "detail": "Evaluator not found"
3 }

422 Unprocessable Entity

1 {
2   "inputs": {
3     "input": ["This field is required."]
4   }
5 }

500 Internal Server Error

1 {
2   "detail": "Evaluation failed: LLM service temporarily unavailable",
3   "error_code": "EVALUATION_EXECUTION_ERROR",
4   "retry_after": 30
5 }

Testing Best Practices

1. Test with Realistic Data

Use actual examples from your use case:

Python

1 # Good: Realistic test data
2 test_data = {
3     "inputs": {
4         "input": "Actual user question from your application",
5         "output": "Actual LLM response you want to evaluate",
6         "metadata": {
7             "user_context": "Real context from your app"
8         }
9     }
10 }

2. Test Edge Cases

Python

1 # Test with empty responses
2 edge_case_data = {
3     "inputs": {
4         "input": "What is AI?",
5         "output": "",  # Empty response
6         "metadata": {"test_case": "empty_response"}
7     }
8 }
9 
10 # Test with very long responses
11 long_response_data = {
12     "inputs": {
13         "input": "Explain machine learning",
14         "output": "Very long response..." * 100,
15         "metadata": {"test_case": "long_response"}
16     }
17 }

3. Validate Configuration

Test your evaluator configuration before production use:

Python

1 # Test multiple examples to validate scoring consistency
2 test_cases = [
3     {"input": "Good question", "output": "Excellent answer", "expected_range": (4, 5)},
4     {"input": "Basic question", "output": "Basic answer", "expected_range": (2, 4)},
5     {"input": "Complex question", "output": "Poor answer", "expected_range": (1, 2)}
6 ]
7 
8 for i, case in enumerate(test_cases):
9     response = requests.post(url, headers=headers, json={
10         "inputs": {
11             "input": case["input"],
12             "output": case["output"]
13         }
14     })
15     score = response.json()["score"]
16     expected_min, expected_max = case["expected_range"]
17     
18     if expected_min <= score <= expected_max:
19         print(f"Test case {i+1}: PASS (score: {score})")
20     else:
21         print(f"Test case {i+1}: FAIL (score: {score}, expected: {expected_min}-{expected_max})")

Executes an evaluator against provided input/output data for testing purposes. This endpoint allows you to test your evaluator configuration before using it in production. ## Authentication All endpoints require API key authentication: ```bash Authorization: Bearer YOUR_API_KEY ``` ## Path Parameters | Parameter | Type | Description | |-----------|------|-------------| | `evaluator_id` | string | The unique ID of the evaluator to run | ## Unified Evaluator Inputs All evaluator runs now receive a single unified `inputs` object. This applies to all evaluator types (`llm`, `human`, `code`). The same fields are also recorded and visible on the Scores page for every evaluation. ### Request Body Structure ```json { "inputs": { "input": {}, "output": {}, "metrics": {}, "metadata": {}, "llm_input": "", "llm_output": "" } } ``` ### Field Descriptions | Field | Type | Required | Description | |-------|------|----------|-------------| | `inputs` | object | Yes | The unified input object containing all evaluation data | | `inputs.input` | any JSON | Yes | The request/input to be evaluated | | `inputs.output` | any JSON | Yes | The response/output being evaluated | | `inputs.metrics` | object | No | System-captured metrics (e.g., tokens, latency, cost) | | `inputs.metadata` | object | No | Context and custom properties you pass; also logged | | `inputs.llm_input` | string | No | Legacy convenience alias for input (maps to unified fields) | | `inputs.llm_output` | string | No | Legacy convenience alias for output (maps to unified fields) | | `generation_method` | string | No | Controls which automation method to use: `"auto"` (default), `"llm"`, `"code"` | ### Generation Method Options <Note> **New Feature**: The `generation_method` parameter allows you to control which automation is used, since evaluators can now have both LLM and code configs. </Note> - **`"auto"`** (default): Automatically selects the best available automation method in order: LLM → Code → Legacy config - **`"llm"`**: Force use of LLM-based evaluation (requires `llm_config` to be configured) - **`"code"`**: Force use of code-based evaluation (requires `code_config` to be configured) **Note:** Human scoring is done through the UI/Scores API, not via this test/run endpoint. **Notes:** - These fields are stored with each evaluation and shown in the Scores page alongside the resulting score - When running evaluators from LLM calls, `inputs` is auto-populated from the request/response and tracing data - Legacy `{{llm_input}}`/`{{llm_output}}` placeholders remain supported and transparently map to the unified fields - New templates should reference `{{input}}` and `{{output}}` ## Examples ### Test LLM Evaluator ```python Python evaluator_id = "0f4325f9-55ef-4c20-8abe-376694419947" url = f"https://api.respan.ai/api/evaluators/{evaluator_id}/run/" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "inputs": { "input": "What is the capital of France?", "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.", "metadata": { "context": "Geography question about European capitals", "user_id": "user_123", "session_id": "session_456" }, "metrics": { "total_request_tokens": 23, "total_response_tokens": 45, "latency": 0.85, "cost": 0.0012 } }, "generation_method": "llm" # Force LLM evaluation } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": { "input": "What is the capital of France?", "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.", "metadata": { "context": "Geography question about European capitals", "user_id": "user_123" }, "metrics": { "total_request_tokens": 23, "total_response_tokens": 45, "latency": 0.85 } }, "generation_method": "llm" }' ``` ### Test Code Evaluator ```python Python # Test boolean code evaluator code_evaluator_id = "bool-eval-456" url = f"https://api.respan.ai/api/evaluators/{code_evaluator_id}/run/" data = { "inputs": { "input": "Write a brief explanation of photosynthesis.", "output": "Photosynthesis is the process by which plants convert sunlight into energy.", "metadata": { "topic": "biology", "difficulty": "basic" } } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/bool-eval-456/run/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": { "input": "Write a brief explanation of photosynthesis.", "output": "Photosynthesis is the process by which plants convert sunlight into energy.", "metadata": { "topic": "biology", "difficulty": "basic" } } }' ``` ### Test Human Categorical Evaluator ```python Python # Test categorical evaluator categorical_evaluator_id = "cat-eval-123" url = f"https://api.respan.ai/api/evaluators/{categorical_evaluator_id}/run/" data = { "inputs": { "input": { "question": "Explain the benefits of renewable energy", "context": "Environmental science discussion" }, "output": { "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.", "confidence": 0.95 }, "metadata": { "evaluator_notes": "Well-structured response covering key points", "evaluation_criteria": ["accuracy", "completeness", "clarity"] } } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/cat-eval-123/run/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": { "input": { "question": "Explain the benefits of renewable energy", "context": "Environmental science discussion" }, "output": { "response": "Renewable energy sources offer numerous benefits including reduced carbon emissions and cost savings.", "confidence": 0.95 }, "metadata": { "evaluator_notes": "Well-structured response" } } }' ``` ### Legacy Format Support ```python Python # Legacy format still supported for backward compatibility data = { "inputs": { "llm_input": "What is the capital of France?", "llm_output": "The capital of France is Paris.", "metadata": { "note": "Using legacy field names" } } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "inputs": { "llm_input": "What is the capital of France?", "llm_output": "The capital of France is Paris." } }' ``` ## Response ### LLM Evaluator Response **Status: 200 OK** ```json { "score": 4.5, "score_type": "numerical", "evaluator_id": "0f4325f9-55ef-4c20-8abe-376694419947", "evaluator_name": "Response Quality Evaluator", "evaluation_result": { "reasoning": "The response is accurate and provides good detail about Paris, including its location and notable landmarks. The answer is complete and well-structured.", "score": 4.5, "passed": true }, "inputs": { "input": "What is the capital of France?", "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.", "metadata": { "context": "Geography question about European capitals", "user_id": "user_123", "session_id": "session_456" }, "metrics": { "total_request_tokens": 23, "total_response_tokens": 45, "latency": 0.85, "cost": 0.0012 } }, "execution_time": 1.23, "timestamp": "2025-09-11T10:30:45.123456Z" } ``` ### Code Evaluator Response ```json { "score": true, "score_type": "boolean", "evaluator_id": "bool-eval-456", "evaluator_name": "Response Length Checker", "evaluation_result": { "result": true, "details": "Response meets minimum length requirement (15 words >= 10 words)" }, "inputs": { "input": "Write a brief explanation of photosynthesis.", "output": "Photosynthesis is the process by which plants convert sunlight into energy.", "metadata": { "topic": "biology", "difficulty": "basic" } }, "execution_time": 0.05, "timestamp": "2025-09-11T10:30:45.123456Z" } ``` ### Human Categorical Evaluator Response ```json { "score": ["Good"], "score_type": "categorical", "evaluator_id": "cat-eval-123", "evaluator_name": "Content Quality Assessment", "evaluation_result": { "selected_choices": ["Good"], "choice_values": [4], "note": "This evaluator requires human annotation. The response structure is validated but no actual evaluation is performed." }, "inputs": { "input": { "question": "Explain the benefits of renewable energy", "context": "Environmental science discussion" }, "output": { "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.", "confidence": 0.95 }, "metadata": { "evaluator_notes": "Well-structured response covering key points", "evaluation_criteria": ["accuracy", "completeness", "clarity"] } }, "execution_time": 0.02, "timestamp": "2025-09-11T10:30:45.123456Z" } ``` ## Response Fields | Field | Type | Description | |-------|------|-------------| | `score` | varies | The evaluation score (type depends on evaluator's `score_value_type`) | | `score_type` | string | The type of score: `numerical`, `boolean`, `categorical`, or `comment` | | `evaluator_id` | string | ID of the evaluator that was run | | `evaluator_name` | string | Name of the evaluator that was run | | `evaluation_result` | object | Detailed evaluation results and reasoning | | `inputs` | object | The input data that was evaluated (echoed back) | | `execution_time` | number | Time taken to execute the evaluation (in seconds) | | `timestamp` | string | ISO timestamp of when the evaluation was performed | ## Score Types by Evaluator ### Numerical Evaluators - **Score**: Number (e.g., `4.5`, `8.2`) - **Range**: Defined by evaluator's `min_score` and `max_score` - **Passing**: Determined by `passing_score` threshold ### Boolean Evaluators - **Score**: Boolean (`true` or `false`) - **Passing**: `true` = passed, `false` = failed ### Categorical Evaluators - **Score**: Array of selected category names (e.g., `["Good", "Accurate"]`) - **Values**: Corresponding numeric values from `categorical_choices` - **Note**: Human evaluators return placeholder values for testing ### Comment Evaluators - **Score**: String with detailed feedback - **Content**: Varies based on evaluator configuration - **Length**: Can be extensive for detailed feedback ## Error Responses ### 400 Bad Request ```json { "detail": "Invalid input format: 'inputs' field is required" } ``` ### 401 Unauthorized ```json { "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys" } ``` ### 404 Not Found ```json { "detail": "Evaluator not found" } ``` ### 422 Unprocessable Entity ```json { "inputs": { "input": ["This field is required."] } } ``` ### 500 Internal Server Error ```json { "detail": "Evaluation failed: LLM service temporarily unavailable", "error_code": "EVALUATION_EXECUTION_ERROR", "retry_after": 30 } ``` ## Testing Best Practices ### 1. Test with Realistic Data Use actual examples from your use case: ```python Python # Good: Realistic test data test_data = { "inputs": { "input": "Actual user question from your application", "output": "Actual LLM response you want to evaluate", "metadata": { "user_context": "Real context from your app" } } } ``` ### 2. Test Edge Cases ```python Python # Test with empty responses edge_case_data = { "inputs": { "input": "What is AI?", "output": "", # Empty response "metadata": {"test_case": "empty_response"} } } # Test with very long responses long_response_data = { "inputs": { "input": "Explain machine learning", "output": "Very long response..." * 100, "metadata": {"test_case": "long_response"} } } ``` ### 3. Validate Configuration Test your evaluator configuration before production use: ```python Python # Test multiple examples to validate scoring consistency test_cases = [ {"input": "Good question", "output": "Excellent answer", "expected_range": (4, 5)}, {"input": "Basic question", "output": "Basic answer", "expected_range": (2, 4)}, {"input": "Complex question", "output": "Poor answer", "expected_range": (1, 2)} ] for i, case in enumerate(test_cases): response = requests.post(url, headers=headers, json={ "inputs": { "input": case["input"], "output": case["output"] } }) score = response.json()["score"] expected_min, expected_max = case["expected_range"] if expected_min <= score <= expected_max: print(f"Test case {i+1}: PASS (score: {score})") else: print(f"Test case {i+1}: FAIL (score: {score}, expected: {expected_min}-{expected_max})") ```

Authentication

AuthorizationBearer

API key authentication. Get your API key from https://platform.respan.ai/platform/api-keys

Path parameters

evaluator_idstringRequired

Evaluator Id

Request

This endpoint expects an object.

Response

Successful response for Run evaluator

inputsobject

Errors

401

Unauthorized Error

Executes an evaluator against provided input/output data for testing purposes. This endpoint allows you to test your evaluator configuration before using it in production.

Authentication

All endpoints require API key authentication:

$ Authorization: Bearer YOUR_API_KEY

Path Parameters

Parameter	Type	Description
`evaluator_id`	string	The unique ID of the evaluator to run

Unified Evaluator Inputs

Request Body Structure

1 {
2   "inputs": {
3     "input": {},
4     "output": {},
5     "metrics": {},
6     "metadata": {},
7     "llm_input": "",
8     "llm_output": ""
9   }
10 }

Field Descriptions

Field	Type	Required	Description
`inputs`	object	Yes	The unified input object containing all evaluation data
`inputs.input`	any JSON	Yes	The request/input to be evaluated
`inputs.output`	any JSON	Yes	The response/output being evaluated
`inputs.metrics`	object	No	System-captured metrics (e.g., tokens, latency, cost)
`inputs.metadata`	object	No	Context and custom properties you pass; also logged
`inputs.llm_input`	string	No	Legacy convenience alias for input (maps to unified fields)
`inputs.llm_output`	string	No	Legacy convenience alias for output (maps to unified fields)
`generation_method`	string	No	Controls which automation method to use: `"auto"` (default), `"llm"`, `"code"`

Generation Method Options

New Feature: The generation_method parameter allows you to control which automation is used, since evaluators can now have both LLM and code configs.

"auto" (default): Automatically selects the best available automation method in order: LLM → Code → Legacy config
"llm": Force use of LLM-based evaluation (requires llm_config to be configured)
"code": Force use of code-based evaluation (requires code_config to be configured)

Note: Human scoring is done through the UI/Scores API, not via this test/run endpoint.

Notes:

These fields are stored with each evaluation and shown in the Scores page alongside the resulting score
When running evaluators from LLM calls, inputs is auto-populated from the request/response and tracing data
Legacy {{llm_input}}/{{llm_output}} placeholders remain supported and transparently map to the unified fields
New templates should reference {{input}} and {{output}}

Examples

Test LLM Evaluator

Python

1 evaluator_id = "0f4325f9-55ef-4c20-8abe-376694419947"
2 url = f"https://api.respan.ai/api/evaluators/{evaluator_id}/run/"
3 headers = {
4     "Authorization": "Bearer YOUR_API_KEY",
5     "Content-Type": "application/json"
6 }
7 
8 data = {
9     "inputs": {
10         "input": "What is the capital of France?",
11         "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
12         "metadata": {
13             "context": "Geography question about European capitals",
14             "user_id": "user_123",
15             "session_id": "session_456"
16         },
17         "metrics": {
18             "total_request_tokens": 23,
19             "total_response_tokens": 45,
20             "latency": 0.85,
21             "cost": 0.0012
22         }
23     },
24     "generation_method": "llm"  # Force LLM evaluation
25 }
26 
27 response = requests.post(url, headers=headers, json=data)
28 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": "What is the capital of France?",
>       "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
>       "metadata": {
>         "context": "Geography question about European capitals",
>         "user_id": "user_123"
>       },
>       "metrics": {
>         "total_request_tokens": 23,
>         "total_response_tokens": 45,
>         "latency": 0.85
>       }
>     },
>     "generation_method": "llm"
>   }'

Test Code Evaluator

Python

1 # Test boolean code evaluator
2 code_evaluator_id = "bool-eval-456"
3 url = f"https://api.respan.ai/api/evaluators/{code_evaluator_id}/run/"
4 
5 data = {
6     "inputs": {
7         "input": "Write a brief explanation of photosynthesis.",
8         "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
9         "metadata": {
10             "topic": "biology",
11             "difficulty": "basic"
12         }
13     }
14 }
15 
16 response = requests.post(url, headers=headers, json=data)
17 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/bool-eval-456/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": "Write a brief explanation of photosynthesis.",
>       "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
>       "metadata": {
>         "topic": "biology",
>         "difficulty": "basic"
>       }
>     }
>   }'

Test Human Categorical Evaluator

Python

1 # Test categorical evaluator
2 categorical_evaluator_id = "cat-eval-123"
3 url = f"https://api.respan.ai/api/evaluators/{categorical_evaluator_id}/run/"
4 
5 data = {
6     "inputs": {
7         "input": {
8             "question": "Explain the benefits of renewable energy",
9             "context": "Environmental science discussion"
10         },
11         "output": {
12             "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
13             "confidence": 0.95
14         },
15         "metadata": {
16             "evaluator_notes": "Well-structured response covering key points",
17             "evaluation_criteria": ["accuracy", "completeness", "clarity"]
18         }
19     }
20 }
21 
22 response = requests.post(url, headers=headers, json=data)
23 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/cat-eval-123/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "input": {
>         "question": "Explain the benefits of renewable energy",
>         "context": "Environmental science discussion"
>       },
>       "output": {
>         "response": "Renewable energy sources offer numerous benefits including reduced carbon emissions and cost savings.",
>         "confidence": 0.95
>       },
>       "metadata": {
>         "evaluator_notes": "Well-structured response"
>       }
>     }
>   }'

Legacy Format Support

Python

1 # Legacy format still supported for backward compatibility
2 data = {
3     "inputs": {
4         "llm_input": "What is the capital of France?",
5         "llm_output": "The capital of France is Paris.",
6         "metadata": {
7             "note": "Using legacy field names"
8         }
9     }
10 }
11 
12 response = requests.post(url, headers=headers, json=data)
13 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "inputs": {
>       "llm_input": "What is the capital of France?",
>       "llm_output": "The capital of France is Paris."
>     }
>   }'

Response

LLM Evaluator Response

Status: 200 OK

1 {
2   "score": 4.5,
3   "score_type": "numerical",
4   "evaluator_id": "0f4325f9-55ef-4c20-8abe-376694419947",
5   "evaluator_name": "Response Quality Evaluator",
6   "evaluation_result": {
7     "reasoning": "The response is accurate and provides good detail about Paris, including its location and notable landmarks. The answer is complete and well-structured.",
8     "score": 4.5,
9     "passed": true
10   },
11   "inputs": {
12     "input": "What is the capital of France?",
13     "output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
14     "metadata": {
15       "context": "Geography question about European capitals",
16       "user_id": "user_123",
17       "session_id": "session_456"
18     },
19     "metrics": {
20       "total_request_tokens": 23,
21       "total_response_tokens": 45,
22       "latency": 0.85,
23       "cost": 0.0012
24     }
25   },
26   "execution_time": 1.23,
27   "timestamp": "2025-09-11T10:30:45.123456Z"
28 }

Code Evaluator Response

1 {
2   "score": true,
3   "score_type": "boolean",
4   "evaluator_id": "bool-eval-456",
5   "evaluator_name": "Response Length Checker",
6   "evaluation_result": {
7     "result": true,
8     "details": "Response meets minimum length requirement (15 words >= 10 words)"
9   },
10   "inputs": {
11     "input": "Write a brief explanation of photosynthesis.",
12     "output": "Photosynthesis is the process by which plants convert sunlight into energy.",
13     "metadata": {
14       "topic": "biology",
15       "difficulty": "basic"
16     }
17   },
18   "execution_time": 0.05,
19   "timestamp": "2025-09-11T10:30:45.123456Z"
20 }

Human Categorical Evaluator Response

1 {
2   "score": ["Good"],
3   "score_type": "categorical",
4   "evaluator_id": "cat-eval-123",
5   "evaluator_name": "Content Quality Assessment",
6   "evaluation_result": {
7     "selected_choices": ["Good"],
8     "choice_values": [4],
9     "note": "This evaluator requires human annotation. The response structure is validated but no actual evaluation is performed."
10   },
11   "inputs": {
12     "input": {
13       "question": "Explain the benefits of renewable energy",
14       "context": "Environmental science discussion"
15     },
16     "output": {
17       "response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
18       "confidence": 0.95
19     },
20     "metadata": {
21       "evaluator_notes": "Well-structured response covering key points",
22       "evaluation_criteria": ["accuracy", "completeness", "clarity"]
23     }
24   },
25   "execution_time": 0.02,
26   "timestamp": "2025-09-11T10:30:45.123456Z"
27 }

Response Fields

Field	Type	Description
`score`	varies	The evaluation score (type depends on evaluator’s `score_value_type`)
`score_type`	string	The type of score: `numerical`, `boolean`, `categorical`, or `comment`
`evaluator_id`	string	ID of the evaluator that was run
`evaluator_name`	string	Name of the evaluator that was run
`evaluation_result`	object	Detailed evaluation results and reasoning
`inputs`	object	The input data that was evaluated (echoed back)
`execution_time`	number	Time taken to execute the evaluation (in seconds)
`timestamp`	string	ISO timestamp of when the evaluation was performed

Score Types by Evaluator

Numerical Evaluators

Score: Number (e.g., 4.5, 8.2)
Range: Defined by evaluator’s min_score and max_score
Passing: Determined by passing_score threshold

Boolean Evaluators

Score: Boolean (true or false)
Passing: true = passed, false = failed

Categorical Evaluators

Score: Array of selected category names (e.g., ["Good", "Accurate"])
Values: Corresponding numeric values from categorical_choices
Note: Human evaluators return placeholder values for testing

Comment Evaluators

Score: String with detailed feedback
Content: Varies based on evaluator configuration
Length: Can be extensive for detailed feedback

Error Responses

400 Bad Request

1 {
2   "detail": "Invalid input format: 'inputs' field is required"
3 }

401 Unauthorized

1 {
2   "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3 }

404 Not Found

1 {
2   "detail": "Evaluator not found"
3 }

422 Unprocessable Entity

1 {
2   "inputs": {
3     "input": ["This field is required."]
4   }
5 }

500 Internal Server Error

1 {
2   "detail": "Evaluation failed: LLM service temporarily unavailable",
3   "error_code": "EVALUATION_EXECUTION_ERROR",
4   "retry_after": 30
5 }

Testing Best Practices

1. Test with Realistic Data

Use actual examples from your use case:

Python

1 # Good: Realistic test data
2 test_data = {
3     "inputs": {
4         "input": "Actual user question from your application",
5         "output": "Actual LLM response you want to evaluate",
6         "metadata": {
7             "user_context": "Real context from your app"
8         }
9     }
10 }

2. Test Edge Cases

Python

1 # Test with empty responses
2 edge_case_data = {
3     "inputs": {
4         "input": "What is AI?",
5         "output": "",  # Empty response
6         "metadata": {"test_case": "empty_response"}
7     }
8 }
9 
10 # Test with very long responses
11 long_response_data = {
12     "inputs": {
13         "input": "Explain machine learning",
14         "output": "Very long response..." * 100,
15         "metadata": {"test_case": "long_response"}
16     }
17 }

3. Validate Configuration

Test your evaluator configuration before production use:

Python

1 # Test multiple examples to validate scoring consistency
2 test_cases = [
3     {"input": "Good question", "output": "Excellent answer", "expected_range": (4, 5)},
4     {"input": "Basic question", "output": "Basic answer", "expected_range": (2, 4)},
5     {"input": "Complex question", "output": "Poor answer", "expected_range": (1, 2)}
6 ]
7 
8 for i, case in enumerate(test_cases):
9     response = requests.post(url, headers=headers, json={
10         "inputs": {
11             "input": case["input"],
12             "output": case["output"]
13         }
14     })
15     score = response.json()["score"]
16     expected_min, expected_max = case["expected_range"]
17     
18     if expected_min <= score <= expected_max:
19         print(f"Test case {i+1}: PASS (score: {score})")
20     else:
21         print(f"Test case {i+1}: FAIL (score: {score}, expected: {expected_min}-{expected_max})")

1	{
2	"inputs": {
3	"input": {},
4	"output": {},
5	"metrics": {},
6	"metadata": {},
7	"llm_input": "",
8	"llm_output": ""
9	}
10	}

1	evaluator_id = "0f4325f9-55ef-4c20-8abe-376694419947"
2	url = f"https://api.respan.ai/api/evaluators/{evaluator_id}/run/"
3	headers = {
4	"Authorization": "Bearer YOUR_API_KEY",
5	"Content-Type": "application/json"
6	}
7
8	data = {
9	"inputs": {
10	"input": "What is the capital of France?",
11	"output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
12	"metadata": {
13	"context": "Geography question about European capitals",
14	"user_id": "user_123",
15	"session_id": "session_456"
16	},
17	"metrics": {
18	"total_request_tokens": 23,
19	"total_response_tokens": 45,
20	"latency": 0.85,
21	"cost": 0.0012
22	}
23	},
24	"generation_method": "llm" # Force LLM evaluation
25	}
26
27	response = requests.post(url, headers=headers, json=data)
28	print(response.json())

$	curl -X POST "https://api.respan.ai/api/evaluators/0f4325f9-55ef-4c20-8abe-376694419947/run/" \
>	-H "Authorization: Bearer YOUR_API_KEY" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"inputs": {
>	"input": "What is the capital of France?",
>	"output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
>	"metadata": {
>	"context": "Geography question about European capitals",
>	"user_id": "user_123"
>	},
>	"metrics": {
>	"total_request_tokens": 23,
>	"total_response_tokens": 45,
>	"latency": 0.85
>	}
>	},
>	"generation_method": "llm"
>	}'

1	# Test boolean code evaluator
2	code_evaluator_id = "bool-eval-456"
3	url = f"https://api.respan.ai/api/evaluators/{code_evaluator_id}/run/"
4
5	data = {
6	"inputs": {
7	"input": "Write a brief explanation of photosynthesis.",
8	"output": "Photosynthesis is the process by which plants convert sunlight into energy.",
9	"metadata": {
10	"topic": "biology",
11	"difficulty": "basic"
12	}
13	}
14	}
15
16	response = requests.post(url, headers=headers, json=data)
17	print(response.json())

1	# Test categorical evaluator
2	categorical_evaluator_id = "cat-eval-123"
3	url = f"https://api.respan.ai/api/evaluators/{categorical_evaluator_id}/run/"
4
5	data = {
6	"inputs": {
7	"input": {
8	"question": "Explain the benefits of renewable energy",
9	"context": "Environmental science discussion"
10	},
11	"output": {
12	"response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
13	"confidence": 0.95
14	},
15	"metadata": {
16	"evaluator_notes": "Well-structured response covering key points",
17	"evaluation_criteria": ["accuracy", "completeness", "clarity"]
18	}
19	}
20	}
21
22	response = requests.post(url, headers=headers, json=data)
23	print(response.json())

1	# Legacy format still supported for backward compatibility
2	data = {
3	"inputs": {
4	"llm_input": "What is the capital of France?",
5	"llm_output": "The capital of France is Paris.",
6	"metadata": {
7	"note": "Using legacy field names"
8	}
9	}
10	}
11
12	response = requests.post(url, headers=headers, json=data)
13	print(response.json())

1	{
2	"score": 4.5,
3	"score_type": "numerical",
4	"evaluator_id": "0f4325f9-55ef-4c20-8abe-376694419947",
5	"evaluator_name": "Response Quality Evaluator",
6	"evaluation_result": {
7	"reasoning": "The response is accurate and provides good detail about Paris, including its location and notable landmarks. The answer is complete and well-structured.",
8	"score": 4.5,
9	"passed": true
10	},
11	"inputs": {
12	"input": "What is the capital of France?",
13	"output": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and landmarks like the Eiffel Tower.",
14	"metadata": {
15	"context": "Geography question about European capitals",
16	"user_id": "user_123",
17	"session_id": "session_456"
18	},
19	"metrics": {
20	"total_request_tokens": 23,
21	"total_response_tokens": 45,
22	"latency": 0.85,
23	"cost": 0.0012
24	}
25	},
26	"execution_time": 1.23,
27	"timestamp": "2025-09-11T10:30:45.123456Z"
28	}

1	{
2	"score": true,
3	"score_type": "boolean",
4	"evaluator_id": "bool-eval-456",
5	"evaluator_name": "Response Length Checker",
6	"evaluation_result": {
7	"result": true,
8	"details": "Response meets minimum length requirement (15 words >= 10 words)"
9	},
10	"inputs": {
11	"input": "Write a brief explanation of photosynthesis.",
12	"output": "Photosynthesis is the process by which plants convert sunlight into energy.",
13	"metadata": {
14	"topic": "biology",
15	"difficulty": "basic"
16	}
17	},
18	"execution_time": 0.05,
19	"timestamp": "2025-09-11T10:30:45.123456Z"
20	}

1	{
2	"score": ["Good"],
3	"score_type": "categorical",
4	"evaluator_id": "cat-eval-123",
5	"evaluator_name": "Content Quality Assessment",
6	"evaluation_result": {
7	"selected_choices": ["Good"],
8	"choice_values": [4],
9	"note": "This evaluator requires human annotation. The response structure is validated but no actual evaluation is performed."
10	},
11	"inputs": {
12	"input": {
13	"question": "Explain the benefits of renewable energy",
14	"context": "Environmental science discussion"
15	},
16	"output": {
17	"response": "Renewable energy sources like solar and wind power offer numerous benefits including reduced carbon emissions, energy independence, and long-term cost savings.",
18	"confidence": 0.95
19	},
20	"metadata": {
21	"evaluator_notes": "Well-structured response covering key points",
22	"evaluation_criteria": ["accuracy", "completeness", "clarity"]
23	}
24	},
25	"execution_time": 0.02,
26	"timestamp": "2025-09-11T10:30:45.123456Z"
27	}

1	{
2	"detail": "Invalid input format: 'inputs' field is required"
3	}

1	{
2	"detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3	}

1	{
2	"detail": "Evaluation failed: LLM service temporarily unavailable",
3	"error_code": "EVALUATION_EXECUTION_ERROR",
4	"retry_after": 30
5	}

1	# Good: Realistic test data
2	test_data = {
3	"inputs": {
4	"input": "Actual user question from your application",
5	"output": "Actual LLM response you want to evaluate",
6	"metadata": {
7	"user_context": "Real context from your app"
8	}
9	}
10	}

1	# Test with empty responses
2	edge_case_data = {
3	"inputs": {
4	"input": "What is AI?",
5	"output": "", # Empty response
6	"metadata": {"test_case": "empty_response"}
7	}
8	}
9
10	# Test with very long responses
11	long_response_data = {
12	"inputs": {
13	"input": "Explain machine learning",
14	"output": "Very long response..." * 100,
15	"metadata": {"test_case": "long_response"}
16	}
17	}

1	# Test multiple examples to validate scoring consistency
2	test_cases = [
3	{"input": "Good question", "output": "Excellent answer", "expected_range": (4, 5)},
4	{"input": "Basic question", "output": "Basic answer", "expected_range": (2, 4)},
5	{"input": "Complex question", "output": "Poor answer", "expected_range": (1, 2)}
6	]
7
8	for i, case in enumerate(test_cases):
9	response = requests.post(url, headers=headers, json={
10	"inputs": {
11	"input": case["input"],
12	"output": case["output"]
13	}
14	})
15	score = response.json()["score"]
16	expected_min, expected_max = case["expected_range"]
17
18	if expected_min <= score <= expected_max:
19	print(f"Test case {i+1}: PASS (score: {score})")
20	else:
21	print(f"Test case {i+1}: FAIL (score: {score}, expected: {expected_min}-{expected_max})")