Dataset Management

Datasets are collections of test cases used to evaluate your AI agents. Learn how to create, upload, and manage datasets in SupaEval.

Dataset Formats

SupaEval supports multiple formats for dataset upload:

CSV Format

Simple tabular format with required columns: prompt, and optionalexpected_output, metadata columns.

csv

prompt,expected_output,metadata
"What is the capital of France?","Paris","{"difficulty": "easy", "category": "geography"}"
"Explain photosynthesis","Plants convert light into energy...","{"difficulty": "medium", "category": "science"}"

JSON Format

Structured format allowing richer metadata and nested fields.

json

{
  "dataset_name": "geography_qa",
  "test_cases": [
    {
      "prompt": "What is the capital of France?",
      "expected_output": "Paris",
      "metadata": {
        "difficulty": "easy",
        "category": "geography"
      }
    },
    {
      "prompt": "What is the largest ocean?",
      "expected_output": "Pacific Ocean",
      "metadata": {
        "difficulty": "easy",
        "category": "geography"
      }
    }
  ]
}

Creating Datasets

Via SDK

python

from supaeval import SupaEval

client = SupaEval(api_key="your_api_key")

# Create dataset from file
dataset = client.datasets.create_from_file(
    name="geography_qa",
    file_path="./dataset.csv",
    description="Geography questions and answers"
)

print(f"Dataset created: {dataset.id}")

Via Dashboard

Navigate to the Datasets page
Click "Create Dataset"
Upload CSV/JSON file or create test cases manually
Add metadata and tags
Save dataset

Best Practices

Start with 10-20 diverse test cases
Include edge cases and failure scenarios
Add metadata for filtering and analysis
Version datasets when making significant changes

Dataset Versioning

SupaEval automatically versions datasets when you update them. This ensures:

Reproducible evaluation runs
Historical comparison of agent performance
Rollback capability if needed

Managing Datasets

Listing Datasets

Filter by Tags

Organize datasets using custom tags

Search by Name

Quickly find datasets by name or description

Next Steps

Running Evaluations

Use your datasets to evaluate agents