Introduction
Welcome to SupaEval! This guide will help you understand what SupaEval is, how it works, and why it's essential for building reliable AI agents.
What is SupaEval?
SupaEval is a quality intelligence platform for AI agents. Unlike traditional evaluation tools that only measure final outputs, SupaEval provides deep insights across every layer of your agent system:
- Retrieval Quality - How well your agent finds relevant information
- Reasoning Accuracy - How well it processes and interprets data
- Tool Usage - How effectively it uses available tools
- Generation Quality - How well it produces final outputs
Core Concepts
Datasets
Collections of test cases used to evaluate your agent. Datasets can include prompts, multi-turn conversations, expected outputs, and metadata like difficulty or domain.
Evaluations
Runs that execute your agent against a dataset and measure performance using specified metrics. Evaluations are reproducible and can be compared over time.
Metrics
Quantitative measures of agent performance computed by SupaEval (not by you). Includes:
- Retrieval metrics (Precision@K, Recall@K, nDCG)
- Generation metrics (relevance, faithfulness, hallucination detection)
- Task success rates
- Latency and cost tracking
Benchmarks
Named evaluation runs against fixed datasets that enable comparison between agent versions, prompt changes, or model swaps.
How It Works
Connect Your Agent
Provide an API endpoint or integrate using our SDK. No changes to your agent code required.
Upload or Select Dataset
Use our benchmark datasets or upload your own custom test cases.
Run Evaluation
SupaEval executes your agent, captures intermediate steps, and computes metrics.
Analyze Results
View detailed dashboards showing where your agent succeeds and fails, with actionable insights.
Key Features
Model Agnostic
Works with any LLM, framework, or agent architecture
No Code Changes
Evaluate existing agents without modifying internals
Reproducible
Deterministic reruns for consistent benchmarking
Enterprise Ready
SOC 2 compliant with advanced security features