A/B test prompts in production

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

Prompt changes can have unpredictable effects on output quality. Instead of deploying a new prompt version to all users at once, you can A/B test it: route a percentage of traffic to the new version, compare evaluation scores, and promote the winner. This cookbook walks through:

Creating two prompt versions
Routing traffic by customer segment
Comparing results with evaluators

1. Create prompt versions

In the Prompts page, create a prompt with two versions:

v1 (current): Your existing production prompt
v2 (candidate): The new prompt you want to test

Each version can have different system instructions, templates, models, or parameters. See Version control for details.

2. Fetch prompts in code

Use the Respan SDK to fetch prompt versions at runtime:

from openai import OpenAI
import requests

client = OpenAI(
    base_url="https://api.respan.ai/api/",
    api_key="YOUR_RESPAN_API_KEY",
)

def get_prompt(prompt_name, version=None):
    """Fetch a prompt from Respan."""
    headers = {"Authorization": "Bearer YOUR_RESPAN_API_KEY"}
    params = {"prompt_name": prompt_name}
    if version:
        params["version"] = version
    resp = requests.get(
        "https://api.respan.ai/api/prompts/",
        headers=headers,
        params=params,
    )
    return resp.json()

3. Route traffic by segment

Split users between prompt versions using customer_identifier or any segmentation logic:

import hashlib

def get_variant(customer_id: str, rollout_pct: int = 50) -> int:
    """Deterministic assignment: same user always gets the same variant."""
    hash_val = int(hashlib.md5(customer_id.encode()).hexdigest(), 16)
    return 2 if (hash_val % 100) < rollout_pct else 1

def call_with_ab_test(customer_id: str, user_message: str):
    variant = get_variant(customer_id, rollout_pct=20)  # 20% get v2
    prompt = get_prompt("support_agent", version=variant)

    response = client.chat.completions.create(
        model=prompt["model"],
        messages=[
            {"role": "system", "content": prompt["messages"][0]["content"]},
            {"role": "user", "content": user_message},
        ],
        extra_body={
            "customer_identifier": customer_id,
            "metadata": {
                "prompt_version": f"v{variant}",
                "experiment": "support_prompt_ab_test",
            },
        },
    )
    return response

4. Evaluate both variants

Set up an online evaluation automation to score both variants automatically:

Create an evaluator that scores response quality (e.g., helpfulness, accuracy)
Create a condition that matches logs with metadata.experiment = "support_prompt_ab_test"
Create an automation that runs the evaluator on matched logs

5. Compare results

Filter the Dashboard by metadata.prompt_version to compare:

Average evaluation scores per variant
Cost per variant
Latency per variant
User feedback per variant

Once you have enough data, promote the winning version and update your rollout to 100%.

Get started

Features

Admin

Security

Resources

Help & Community

A/B test prompts in production

Overview

1. Create prompt versions

2. Fetch prompts in code

3. Route traffic by segment

4. Evaluate both variants

5. Compare results

Next steps

Prompt management

Online evaluation

Get started

Features

Admin

Security

Resources

Help & Community

​Overview

​1. Create prompt versions

​2. Fetch prompts in code

​3. Route traffic by segment

​4. Evaluate both variants

​5. Compare results

​Next steps

Prompt management

Online evaluation

Overview

1. Create prompt versions

2. Fetch prompts in code

3. Route traffic by segment

4. Evaluate both variants

5. Compare results

Next steps