Datasets
Datasets are collections of test cases used to evaluate and optimize your prompts. They provide the ground truth against which your prompts are measured, enabling systematic quality assurance and continuous improvement.What is a Dataset?
A dataset is scoped to a prompt (linked viapromptGroupId) and contains test case items with:
- Input: JSON object of variable values matching the prompt’s
inputSchema - Expected Output: (Optional) JSON object of reference responses matching the prompt’s
outputSchema - Name: (Optional) Human-readable test case name
- Labels: (Optional) ML-style labels for categorization (e.g.,
["edge-case", "regression"]) - Metadata: (Optional) Arbitrary JSON for filtering and organization
Use Cases
Quality Testing
Verify prompts produce expected results across diverse inputs. Catch issues before deployment.
Regression Testing
Ensure changes don’t break existing behavior. Run the same tests after each prompt update.
Optimization Training
Provide training data for the optimization engine. Better datasets lead to better optimized prompts.
Benchmarking
Compare different prompt versions objectively. Track improvement over time with consistent test cases.
How Datasets Work
Two-Step Creation
Datasets follow a two-step creation pattern:- Create dataset metadata — name, description, and association to a prompt
- Bulk insert items — upload test case items via file or inline JSON
Quick Example
Dataset Types
Golden Datasets
Golden Datasets
High-quality, human-verified test cases that serve as the benchmark:
- Carefully curated inputs
- Expert-written expected outputs
- Used for critical quality gates
Synthetic Datasets
Synthetic Datasets
Generated test cases for broader coverage:
- Created from templates or rules
- Useful for stress testing
- May not have expected outputs
Production Datasets
Production Datasets
Real-world data captured from production:
- Actual user inputs
- Representative of real usage
- May require anonymization
What’s Next?
Creating Datasets
Learn how to build datasets with items, imports, and the CLI
Best Practices
Guidelines for effective dataset design and management