Quickstart Guide

Get started with SupaEval in under 5 minutes. This guide will walk you through creating your first evaluation, from setup to viewing results.

What you'll learn

By the end of this guide, you'll have created a dataset, run an evaluation, and viewed your first quality metrics.

Prerequisites

A SupaEval account (sign up here)
An API key from your dashboard
An AI agent or LLM application to evaluate

Step 1: Installation

Choose your preferred language and install the SDK:

bash

pip install supaeval

Requires Python 3.8 or higher

Step 2: Run Your First Evaluation

Here's a complete example that creates a dataset, adds test cases, and runs an evaluation:

python

from supaeval import SupaEval

# Initialize with your API key
client = SupaEval(api_key="sk_live_...")

# Create a simple dataset
dataset = client.datasets.create(
    name="quickstart-dataset",
    description="My first evaluation"
)

# Add a test case
dataset.add_items([{
    "input": "What is 2 + 2?",
    "expected_output": "4"
}])

# Run evaluation
evaluation = client.evaluations.create(
    dataset_id=dataset.id,
    agent_endpoint="https://your-agent.api/chat"
)

# Get results
results = evaluation.get_results()
print(f"Score: {results.overall_score}")

Agent Endpoint

Your agent endpoint should accept POST requests with {"input": "..."} and return {"output": "..."}. See our evaluation guide for custom configurations.

Step 3: View Results

After running your evaluation, view detailed results in the SupaEval dashboard:

Overall Quality Score87%

Pass Rate85/100 tests passed

Average Latency1.2s

Next Steps

Now that you've run your first evaluation, you can:

📊 Explore Metrics

Learn about different evaluation metrics and how to interpret them

📚 Manage Datasets

Create comprehensive test datasets for thorough evaluation

🔬 Run Benchmarks

Compare agent versions with benchmarks

🔐 Production Setup

Learn about security best practices for production deployments

Need help?

Join our community Discord or check out the full documentation for more details.