Prompt Optimization
MutagenT’s optimization engine automatically improves prompts using AI-driven mutation and evaluation cycles. Instead of manually tweaking prompts, let the system find better variations for you.How It Works
The optimization process follows an evolutionary approach:The Four Steps
- Analyze - Evaluate the current prompt against the dataset to establish a baseline score
- Mutate - Use AI to generate prompt variations (rewording, restructuring, adding/removing content)
- Test - Evaluate each variation against the same dataset using the evaluation criteria
- Select - Keep the best performing version as the new baseline
- Target score is reached
- Max iterations hit
- No improvement found (patience exceeded)
Key Concepts
Evaluation Criteria
The metrics defined in your evaluation that score each prompt variant. Optimization improves scores across selected criteria.
Target Score
The goal score to achieve. Optimization stops early when this threshold is reached.
Patience
Number of iterations without improvement before stopping early. Prevents wasted compute.
Iterations
Maximum number of mutation-evaluation cycles. More iterations = better results (to a point).
Key Features
Optimization Jobs
Configure, run, and manage optimization jobs with full lifecycle control.
Real-time Streaming
Watch optimization progress in real-time via WebSocket updates.
Quick Start
Prerequisites for Optimization
Before running optimization, ensure you have:Evaluation Criteria
An evaluation definition with criteria (metrics, weights, thresholds) linked to the prompt
Optimization Strategies
Choose the right approach based on your needs:Conservative
Small, incremental changes. Lower risk of breaking existing behavior. Good for production prompts.- Prompts already in production
- Risk-sensitive applications
- Fine-tuning existing prompts
Balanced
Moderate changes with balanced exploration. Good default for most use cases.- New prompt development
- General improvement
- Unknown optimization potential
Aggressive
Larger, more experimental changes. May find significantly better prompts but requires more iterations.- Prompts with low baseline scores
- Exploring new approaches
- When conservative optimization plateaus
When to Optimize
After creating a new prompt baseline
After creating a new prompt baseline
Once you have a working prompt, dataset, and evaluation criteria, run optimization to improve it before going to production.
When evaluation scores drop
When evaluation scores drop
If your prompt’s scores decline (due to model changes, new edge cases, etc.), optimization can help recover.
Before major deployments
Before major deployments
As part of your release process, optimize prompts to ensure they’re performing at their best.
Periodically as maintenance
Periodically as maintenance
Schedule regular optimization runs to prevent gradual degradation and capture improvement opportunities.
Optimization vs Manual Tuning
| Aspect | Manual Tuning | Automated Optimization |
|---|---|---|
| Time | Hours to days | Minutes to hours |
| Consistency | Subjective | Objective, metric-driven |
| Coverage | Limited variations tried | Many variations explored |
| Reproducibility | Hard to reproduce | Fully tracked and reproducible |
| Expertise required | High | Low (once dataset is ready) |
What’s Next?
Optimization Jobs
Learn to configure, run, and manage optimization jobs
Streaming Updates
Get real-time progress updates via WebSocket