Together AI (tracing)

The Together AI Python SDK is the official client for Together’s inference platform. respan-instrumentation-together patches the generated Together SDK resources and emits Respan spans for direct Together AI calls.

  1. Sign up - Create an account at platform.respan.ai
  2. Create an API key - Generate one on the API keys page

See Together AI gateway setup to route Together AI model calls through the Respan gateway.

Setup

1

Install packages

$pip install respan-ai respan-instrumentation-together together python-dotenv
2

Set environment variables

$export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"
$export TOGETHER_API_KEY="YOUR_TOGETHER_API_KEY"

Optional:

$export TOGETHER_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo"

TOGETHER_API_KEY is used by the Together SDK for direct provider calls. RESPAN_API_KEY is used by Respan for trace export.

3

Initialize and run

1import os
2
3from dotenv import load_dotenv
4from respan import Respan, workflow
5from respan_instrumentation_together import TogetherInstrumentor
6from together import Together
7
8load_dotenv()
9
10respan = Respan(
11 api_key=os.environ["RESPAN_API_KEY"],
12 instrumentations=[TogetherInstrumentor()],
13)
14client = Together(api_key=os.environ["TOGETHER_API_KEY"])
15
16
17@workflow(name="together_chat_completion")
18def run_chat() -> str:
19 response = client.chat.completions.create(
20 model=os.getenv("TOGETHER_MODEL", "meta-llama/Llama-3.2-3B-Instruct-Turbo"),
21 messages=[
22 {
23 "role": "user",
24 "content": "Reply with one concise sentence about tracing.",
25 }
26 ],
27 )
28 return response.choices[0].message.content or ""
29
30
31try:
32 print(run_chat())
33finally:
34 respan.flush()
35 respan.shutdown()
4

View your trace

Open the Traces page and search for the workflow name together_chat_completion.

Configuration

ParameterTypeDefaultDescription
api_keystr | NoneNoneRespan API key. Falls back to RESPAN_API_KEY.
base_urlstr | NoneNoneRespan API base URL. Falls back to RESPAN_BASE_URL.
instrumentationslist[]Plugin instrumentations to activate, such as TogetherInstrumentor().
customer_identifierstr | NoneNoneDefault customer identifier for all exported spans.
metadatadict | NoneNoneDefault metadata attached to all exported spans.
environmentstr | NoneNoneEnvironment tag, such as "production".

Supported calls

SDK callTraced
client.chat.completions.create(...)Chat completion spans
client.chat.completions.create(..., stream=True)Streaming chat completion spans after the stream is consumed or closed
await async_client.chat.completions.create(...)Async chat completion spans
client.completions.create(...)Text completion spans
client.embeddings.create(...)Embedding spans with vector payloads summarized
client.rerank.create(...)Rerank spans
client.images.generate(...)Image generation spans
tools=[...] / tool_calls responsesTool definitions and assistant tool calls

Attributes

Attach customer identifiers, thread IDs, workflow names, and metadata to Together AI calls with propagate_attributes.

1from respan import propagate_attributes
2
3with propagate_attributes(
4 customer_identifier="user_123",
5 thread_identifier="conversation_456",
6 trace_group_identifier="together_support_chat",
7 metadata={"plan": "pro", "workflow_name": "together_support_chat"},
8):
9 response = client.chat.completions.create(
10 model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
11 messages=[{"role": "user", "content": "Summarize our support policy."}],
12 )
AttributeTypeDescription
customer_identifierstrIdentifies the end user in Respan analytics.
thread_identifierstrGroups related messages into a conversation.
trace_group_identifierstrGroups spans by workflow name.
metadatadictCustom key-value pairs merged with default metadata.

Examples

Streaming chat

1stream = client.chat.completions.create(
2 model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
3 messages=[{"role": "user", "content": "Write a short haiku about traces."}],
4 stream=True,
5)
6
7for chunk in stream:
8 content = chunk.choices[0].delta.content
9 if content:
10 print(content, end="", flush=True)

Async chat

1import asyncio
2import os
3
4from together import AsyncTogether
5
6
7async def main() -> None:
8 client = AsyncTogether(api_key=os.environ["TOGETHER_API_KEY"])
9 response = await client.chat.completions.create(
10 model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
11 messages=[{"role": "user", "content": "Explain async tracing briefly."}],
12 )
13 print(response.choices[0].message.content)
14 await client.close()
15
16
17asyncio.run(main())

Embeddings

1response = client.embeddings.create(
2 model="BAAI/bge-base-en-v1.5",
3 input=[
4 "Respan traces Together AI chat calls.",
5 "Embeddings should not export vector payloads.",
6 ],
7)
8print(len(response.data))

Rerank

1response = client.rerank.create(
2 model="Salesforce/Llama-Rank-v1",
3 query="Which document is about observability?",
4 documents=[
5 "Distributed tracing shows how requests move through services.",
6 "Sourdough bread needs flour, water, salt, and patience.",
7 ],
8 top_n=1,
9 return_documents=True,
10)
11print(response.results[0].index)

Image generation

1response = client.images.generate(
2 model="black-forest-labs/FLUX.1-schnell-Free",
3 prompt="A small line-art observability dashboard icon",
4 n=1,
5 width=256,
6 height=256,
7 steps=4,
8 response_format="url",
9)
10print(len(response.data))

Tool calling

1response = client.chat.completions.create(
2 model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
3 messages=[{"role": "user", "content": "What is the weather in Tokyo?"}],
4 tools=[
5 {
6 "type": "function",
7 "function": {
8 "name": "get_weather",
9 "description": "Get the current weather for a city.",
10 "parameters": {
11 "type": "object",
12 "properties": {"city": {"type": "string"}},
13 "required": ["city"],
14 },
15 },
16 }
17 ],
18)
19print(response.choices[0].message.tool_calls)