Traces

Save traces of model executions for debugging and distillation

An eight-factor AI application maintains comprehensive traces of all model interactions and process executions. A trace captures the complete context, execution path, and results of each operation. Traces should be structured, searchable, and suitable for both debugging and improvement.

Traces in an eight-factor app serve multiple purposes:

Debugging complex interactions
Performance optimization
Cost monitoring
Quality assurance
Training data collection
Process improvement

A complete trace includes:

Input context and prompts used
Model responses and parameters
Tool calls and results
Reasoning steps taken
Resource usage metrics
Timing information
Final outcomes

Bad practice - minimal or unstructured logging:

def process_request(request):
    logging.info(f"Processing request: {request}")
    result = model.generate(prompt)
    logging.info(f"Got result: {result}")
    return result

Good practice - structured tracing:

class Tracer:
    def __init__(self, storage: TraceStorage):
        self.storage = storage
        self.current_trace = None
    
    @contextmanager
    def trace(self, operation_type: str, metadata: dict = None):
        trace = Trace(
            operation_type=operation_type,
            metadata=metadata,
            timestamp=datetime.now()
        )
        
        try:
            self.current_trace = trace
            yield trace
            trace.status = "success"
        except Exception as e:
            trace.status = "error"
            trace.error = str(e)
            raise
        finally:
            trace.end_timestamp = datetime.now()
            self.storage.store(trace)

class ModelInteractionTracer(Tracer):
    async def traced_generate(
        self, 
        prompt: str, 
        context: Context,
        **params
    ) -> TraceResult:
        with self.trace("model_generation") as trace:
            trace.record_input(prompt, context, params)
            
            result = await self.model.generate(prompt, **params)
            
            trace.record_output(
                result,
                token_usage=self.count_tokens(prompt, result),
                latency=trace.duration
            )
            
            return TraceResult(result, trace)

Traces should be structured hierarchically:

class WorkflowTrace(Trace):
    def __init__(self):
        self.steps = []
        self.current_step = None
    
    @contextmanager
    def trace_step(self, step: Step):
        step_trace = StepTrace(step)
        self.current_step = step_trace
        self.steps.append(step_trace)
        
        try:
            yield step_trace
        finally:
            step_trace.complete()
    
    def to_timeline(self) -> Timeline:
        return Timeline([
            TimelineEvent(
                step=step.name,
                start=step.start_time,
                end=step.end_time,
                details=step.details
            )
            for step in self.steps
        ])

This approach enables:

Process visualization
Performance analysis
Cost attribution
Quality monitoring
Continuous improvement

Traces should be treated as valuable data:

Stored in a queryable format
Retained according to policy
Protected for privacy
Indexed for search
Analyzed for patterns

Applications should support trace analysis:

class TraceAnalyzer:
    def analyze_traces(
        self, 
        timeframe: TimeRange, 
        filters: Dict[str, any]
    ) -> Analysis:
        traces = self.storage.query(timeframe, filters)
        
        return Analysis(
            performance=self.analyze_performance(traces),
            costs=self.analyze_costs(traces),
            quality=self.analyze_quality(traces),
            patterns=self.detect_patterns(traces),
            anomalies=self.detect_anomalies(traces)
        )
    
    def analyze_performance(self, traces: List[Trace]) -> Metrics:
        return Metrics({
            "p50_latency": self.percentile(traces, "duration", 50),
            "p95_latency": self.percentile(traces, "duration", 95),
            "token_usage": self.average(traces, "tokens_used"),
            "success_rate": self.success_rate(traces)
        })

This pattern enables:

Performance optimization
Cost optimization
Quality improvement
Pattern discovery
Anomaly detection

Trace implementation should optimize for:

Completeness - capturing all relevant data
Structure - organized for analysis
Accessibility - easy to query
Privacy - protecting sensitive data
Efficiency - managing storage costs

Each trace should capture:

Full execution context
All model interactions
Resource usage
Timing information
Process decisions
Final outcomes

Traces can also feed back into other factors:

Examples - identifying good cases for the example suite
Workflows - optimizing process flows
Reasoning - improving decision patterns
Context - refining context selection
Tools - measuring tool effectiveness