LlamaIndex (tracing)

LlamaIndex is a framework for building LLM applications with your own data. It provides indexes, query engines, retrievers, and agents for retrieval-augmented generation. Respan captures LlamaIndex spans through respan-tracing, including index construction, retrieval, LLM calls, embeddings, and agent tool use.

Set up Respan

Create an account at platform.respan.ai and grab an API key.

Run npx @respan/cli setup to set up with your coding agent.

Use Respan Gateway

See LlamaIndex gateway setup to route this integration through the Respan gateway.

Example projects

Example repo root: respan-example-projects/python/tracing/llama-index

Setup

Install packages

$ pip install respan-ai respan-instrumentation-llama-index llama-index llama-index-llms-openai llama-index-embeddings-openai

Set environment variables

$ export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
$ export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

OPENAI_API_KEY is used for the LlamaIndex OpenAI LLM. RESPAN_API_KEY exports traces to Respan.

Initialize and run

1 from llama_index.core import Document, Settings, SummaryIndex
2 from llama_index.llms.openai import OpenAI
3 from respan import Respan, workflow
4 from respan_instrumentation_llama_index import LlamaIndexInstrumentor
5 
6 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
7 Settings.llm = OpenAI(model="gpt-4o-mini")
8 
9 @workflow(name="llama_index_query")
10 def run_query():
11     index = SummaryIndex.from_documents(
12         [Document(text="Respan traces LlamaIndex query engines and LLM calls.")]
13     )
14     return index.as_query_engine().query("What does Respan trace?")
15 
16 response = run_query()
17 print(response)
18 respan.flush()

View your trace

Open the Traces page to see your workflow with index, retrieval, LLM, embedding, and tool spans.

Configuration

Parameter	Type	Default	Description
`api_key`	`str \| None`	`None`	Falls back to `RESPAN_API_KEY` env var.
`base_url`	`str \| None`	`None`	Falls back to `RESPAN_BASE_URL` env var.
`instrumentations`	`list`	`[]`	Plugin instrumentations to activate, for example `LlamaIndexInstrumentor()`.
`customer_identifier`	`str \| None`	`None`	Default customer identifier for all spans.
`metadata`	`dict \| None`	`None`	Default metadata attached to all spans.
`environment`	`str \| None`	`None`	Environment tag, for example `"production"`.
`capture_content`	`bool`	`True`	`LlamaIndexInstrumentor` option. Set to `False` to avoid recording prompts, responses, and document content.

Attributes

In Respan()

Set defaults at initialization. These apply to all spans emitted by the LlamaIndex instrumentor.

1 from respan import Respan
2 from respan_instrumentation_llama_index import LlamaIndexInstrumentor
3 
4 respan = Respan(
5     instrumentations=[LlamaIndexInstrumentor()],
6     customer_identifier="user_123",
7     metadata={"service": "rag-api", "version": "1.0.0"},
8 )

With propagate_attributes

Override per request using a context scope.

1 from llama_index.core import Document, SummaryIndex
2 from respan import Respan, workflow
3 from respan_instrumentation_llama_index import LlamaIndexInstrumentor
4 
5 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6 
7 @workflow(name="rag_request")
8 def run_query(question: str):
9     index = SummaryIndex.from_documents([
10         Document(text="Respan captures traces from LlamaIndex applications.")
11     ])
12     return index.as_query_engine().query(question)
13 
14 def handle_request(user_id: str, question: str):
15     with respan.propagate_attributes(
16         customer_identifier=user_id,
17         thread_identifier="conv_abc_123",
18         metadata={"plan": "pro"},
19     ):
20         response = run_query(question)
21         print(response)

Attribute	Type	Description
`customer_identifier`	`str`	Identifies the end user in Respan analytics.
`thread_identifier`	`str`	Groups related messages into a conversation.
`metadata`	`dict`	Custom key-value pairs. Merged with default metadata.

Decorators (optional)

Decorators are not required for LlamaIndex instrumentation. Query engines, retrievers, agents, tools, embeddings, and LLM calls are captured by LlamaIndexInstrumentor. Use @workflow and @task when you want to group several LlamaIndex operations into one named trace tree.

1 from llama_index.core import Document, SummaryIndex
2 from respan import Respan, task, workflow
3 from respan_instrumentation_llama_index import LlamaIndexInstrumentor
4 
5 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6 
7 @task(name="build_index")
8 def build_index():
9     return SummaryIndex.from_documents([
10         Document(text="Respan traces LlamaIndex index and query operations."),
11         Document(text="LlamaIndex combines retrievers, query engines, and agents."),
12     ])
13 
14 @workflow(name="rag_pipeline")
15 def rag_pipeline(question: str):
16     index = build_index()
17     return index.as_query_engine().query(question)
18 
19 print(rag_pipeline("Summarize the documents."))
20 respan.flush()

Examples

Query engine

Query engines are captured with nested retriever, synthesizer, and LLM spans.

1 from llama_index.core import Document, SummaryIndex
2 
3 index = SummaryIndex.from_documents([
4     Document(text="Respan captures traces for LLM calls and workflow steps."),
5     Document(text="LlamaIndex can combine query engines, retrievers, and tools."),
6 ])
7 query_engine = index.as_query_engine()
8 
9 response = query_engine.query("What do the documents say?")
10 print(response)

Embeddings

Embedding calls are captured as embedding logs. Vector values are summarized instead of recording the full embedding array.

1 from llama_index.embeddings.openai import OpenAIEmbedding
2 
3 embedding_model = OpenAIEmbedding(model="text-embedding-3-small")
4 embedding = embedding_model.get_text_embedding(
5     "LlamaIndex uses embeddings to retrieve relevant document chunks."
6 )
7 print(len(embedding))

Tool-use agent

LlamaIndex ReAct agents emit agent, tool, and LLM spans in the same trace tree.

1 import asyncio
2 from llama_index.core.agent.workflow import ReActAgent
3 from llama_index.core.tools import FunctionTool
4 
5 def multiply_numbers(a: int, b: int) -> int:
6     return a * b
7 
8 multiply_tool = FunctionTool.from_defaults(
9     fn=multiply_numbers,
10     name="multiply_numbers",
11     description="Multiply two integers and return the product.",
12 )
13 agent = ReActAgent(
14     tools=[multiply_tool],
15     system_prompt="Use tools when arithmetic is required.",
16     streaming=False,
17 )
18 
19 async def main():
20     response = await agent.run(
21         user_msg="Use the multiply_numbers tool to calculate 7 multiplied by 6."
22     )
23     print(response)
24 
25 asyncio.run(main())