Datasets
Datasets are collections of test cases used to evaluate and optimize your prompts. They provide the ground truth against which your prompts are measured, enabling systematic quality assurance and continuous improvement.What is a Dataset?
A dataset contains:- Input variables: Values to substitute into prompts (maps to your prompt’s
{{variables}}) - Expected outputs: (Optional) Reference responses for comparison scoring
- Metadata: Additional context for filtering and organization
Use Cases
Quality Testing
Verify prompts produce expected results across diverse inputs. Catch issues before deployment.
Regression Testing
Ensure changes don’t break existing behavior. Run the same tests after each prompt update.
Optimization Training
Provide training data for the optimization engine. Better datasets lead to better optimized prompts.
Benchmarking
Compare different prompt versions objectively. Track improvement over time with consistent test cases.
How Datasets Work
Quick Example
Dataset Types
Golden Datasets
Golden Datasets
High-quality, human-verified test cases that serve as the benchmark:
- Carefully curated inputs
- Expert-written expected outputs
- Used for critical quality gates
Synthetic Datasets
Synthetic Datasets
Generated test cases for broader coverage:
- Created from templates or rules
- Useful for stress testing
- May not have expected outputs
Production Datasets
Production Datasets
Real-world data captured from production:
- Actual user inputs
- Representative of real usage
- May require anonymization