A/B test prompts in production
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Overview
Prompt changes can have unpredictable effects on output quality. Instead of deploying a new prompt version to all users at once, you can A/B test it: route a percentage of traffic to the new version, compare evaluation scores, and promote the winner.
This cookbook walks through:
- Creating two prompt versions
- Routing traffic by customer segment
- Comparing results with evaluators
1. Create prompt versions
In the Prompts page, create a prompt with two versions:
- v1 (current): Your existing production prompt
- v2 (candidate): The new prompt you want to test
Each version can have different system instructions, templates, models, or parameters. See Version control for details.
2. Fetch prompts in code
Use the Respan SDK to fetch prompt versions at runtime:
3. Route traffic by segment
Split users between prompt versions using customer_identifier or any segmentation logic:
4. Evaluate both variants
Set up an online evaluation automation to score both variants automatically:
- Create an evaluator that scores response quality (e.g., helpfulness, accuracy)
- Create a condition that matches logs with
metadata.experiment = "support_prompt_ab_test" - Create an automation that runs the evaluator on matched logs
5. Compare results
Filter the Dashboard by metadata.prompt_version to compare:
- Average evaluation scores per variant
- Cost per variant
- Latency per variant
- User feedback per variant
Once you have enough data, promote the winning version and update your rollout to 100%.