Trace context

TraceContext wraps a trace and its spans, providing typed methods for accessing messages, tool calls, grader results, and more. It’s the main way to work with trace data in Manta.

Creating a TraceContext

from manta import TraceContext
from manta.data import MantaService

service = MantaService()
traces = service.load_traces(run_id="run_xxx")
spans_by_trace = service.load_spans_for_traces([t.trace_id for t in traces])

for trace in traces:
    ctx = TraceContext(trace, span_loader=lambda tid: spans_by_trace.get(tid, []))

You can also pass a MantaService directly — useful if your analysis needs to make additional queries:

ctx = TraceContext(trace, span_loader=..., service=service)
ctx.service.load_traces(run_id=ctx.run_id)  # additional queries

Trace properties

ctx.trace_id          # str — unique trace ID
ctx.run_id            # str — parent run ID
ctx.step_number       # int — training/eval step
ctx.phase             # str — "train", "eval", or "prod"
ctx.datapoint_id      # str — parent datapoint ID

ctx.trace.score       # float | None — grader score
ctx.trace.status      # str — "completed", "error", etc.
ctx.trace.cost_usd    # float — total cost
ctx.trace.duration_ms # int — trace duration
ctx.trace.turn_count  # int — number of turns
ctx.trace.tokens_input   # int
ctx.trace.tokens_output  # int
ctx.trace.model_name     # str
ctx.trace.env_name       # str
ctx.trace.meta           # dict — trace metadata

Messages

ctx.messages() returns MessageData objects. Filter by role:

all_messages = ctx.messages()
assistant_msgs = ctx.messages(role="assistant")
user_msgs = ctx.messages(role="user")

Each MessageData has:

Property	Type	Description
`role`	`str`	`"user"`, `"assistant"`, `"system"`, or `"tool"`
`content`	`str`	Text content
`tool_calls`	`list[dict]`	Function calls in this message
`is_empty`	`bool`	No content and no tool calls
`has_tool_calls`	`bool`	Contains tool calls

for msg in ctx.messages(role="assistant"):
    if msg.is_empty:
        print("Empty assistant message")
    if msg.has_tool_calls:
        print(f"Called: {msg.tool_call_names()}")

Convenience shortcuts:

ctx.assistant_messages()  # same as ctx.messages(role="assistant")
ctx.user_messages()       # same as ctx.messages(role="user")

Tool calls

ctx.tools() returns ToolData objects. Filter by name, requestor, or error status:

all_tools = ctx.tools()
bash_calls = ctx.tools(name="bash")
failed = ctx.tools(errors_only=True)
agent_tools = ctx.tools(requestor="agent")

Each ToolData has:

Property	Type	Description
`name`	`str`	Tool name
`call_id`	`str`	Unique call ID
`arguments`	`str`	Raw JSON arguments
`arguments_dict`	`dict`	Parsed arguments
`result`	`str`	Tool output
`error`	`bool`	Whether the call errored
`requestor`	`str`	`"agent"`, `"user"`, or `"system"`

Convenience shortcuts:

ctx.tool_names()    # list[str] — names of all tools used
ctx.failed_tools()  # list[ToolData] — tools that errored

Grader results

ctx.grader_result() returns the aggregate grading result with individual criteria populated:

result = ctx.grader_result()
if result:
    result.score       # float | None — aggregate score
    result.passed      # bool | None
    result.grader_name # str
    result.reasoning   # str
    result.criteria    # list[GraderCriterionData]

Each criterion:

Property	Type	Description
`criterion_name`	`str`	e.g. `"correctness"`, `"db_state"`
`passed`	`bool`	Whether this criterion passed
`score`	`float \| None`	Criterion score
`reasoning`	`str`	Grader reasoning

result = ctx.grader_result()
if result:
    for c in result.criteria:
        print(f"{c.criterion_name}: passed={c.passed}")

Query criteria directly:

correctness = ctx.grader_criteria(criterion="correctness")
all_criteria = ctx.grader_criteria(grader="my-grader")

LLM calls

ctx.llm_calls() returns raw LLM invocations:

for llm in ctx.llm_calls():
    llm.system          # str — provider ("openai", "anthropic")
    llm.request_model   # str — model name
    llm.input_tokens    # int
    llm.output_tokens   # int
    llm.cost_usd        # float

Conversation

ctx.conversation() returns the full conversation as a flat list of dicts — convenient for sending to an LLM:

conversation = ctx.conversation()
# [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]

content = "\n".join(f"{m['role']}: {m['content']}" for m in conversation)

Tool spans appear as {"role": "tool", "content": "input: {...}\noutput: ..."}.

Utility methods

ctx.has_errors()       # bool — trace failed or any span errored
ctx.has_tool_errors()  # bool — any tool call errored
ctx.total_tokens()     # int — tokens_input + tokens_output
ctx.span_tokens()      # tuple[int, int] — (input, output) from LLM spans

Caribou

Manta

​Creating a TraceContext

​Trace properties

​Messages

​Tool calls

​Grader results

​LLM calls

​Conversation

​Utility methods