Skip to main content

Prompt Optimization

MutagenT’s optimization engine automatically improves prompts using AI-driven mutation and evaluation cycles. Instead of manually tweaking prompts, let the system find better variations for you.

How It Works

The optimization process follows an evolutionary approach:

The Four Steps

  1. Analyze - Evaluate the current prompt against the dataset to establish a baseline score
  2. Mutate - Use AI to generate prompt variations (rewording, restructuring, adding/removing content)
  3. Test - Evaluate each variation against the same dataset
  4. Select - Keep the best performing version as the new baseline
The cycle repeats until:
  • Target score is reached
  • Max iterations hit
  • No improvement found

Key Concepts

Mutation Strength

Controls how different the variations are from the original. Low = minor tweaks, High = major rewrites.

Evaluation Metrics

The criteria used to score prompts. Optimization improves scores across selected metrics.

Target Score

The goal score to achieve. Optimization stops when this is reached.

Iterations

Number of mutation-evaluation cycles. More iterations = better results (to a point).

Key Features

Quick Start

import { Mutagent } from '@mutagent/sdk';

const client = new Mutagent({
  bearerAuth: process.env.MUTAGENT_API_KEY,
});

// Start optimization
const job = await client.optimization.postApiPromptByIdOptimize({
  id: 123,  // Prompt ID
  datasetId: 456,
  config: {
    maxIterations: 10,
    targetScore: 0.9,
    mutationStrength: 0.5,
  },
});

console.log('Job started:', job.jobId);

Optimization Strategies

Choose the right strategy based on your needs:

Conservative

Small, incremental changes. Lower risk of breaking existing behavior. Good for production prompts that are already working well.
const job = await client.optimization.postApiPromptByIdOptimize({
  id: promptId,
  datasetId,
  config: {
    mutationStrength: 0.2,      // Small changes
    maxIterations: 20,          // More iterations for gradual improvement
  },
});
Best for:
  • Prompts already in production
  • Risk-sensitive applications
  • Fine-tuning existing prompts

Balanced

Moderate changes with balanced exploration. Good default for most use cases.
const job = await client.optimization.postApiPromptByIdOptimize({
  id: promptId,
  datasetId,
  config: {
    mutationStrength: 0.5,      // Moderate changes
    maxIterations: 10,          // Standard iteration count
  },
});
Best for:
  • New prompt development
  • General improvement
  • Unknown optimization potential

Aggressive

Larger, more experimental changes. May find significantly better prompts but could also produce inconsistent results.
const job = await client.optimization.postApiPromptByIdOptimize({
  id: promptId,
  datasetId,
  config: {
    mutationStrength: 0.8,      // Large changes
    maxIterations: 10,          // Fewer iterations (each is more impactful)
  },
});
Best for:
  • Prompts with low baseline scores
  • Exploring new approaches
  • When conservative optimization plateaus

When to Optimize

Once you have a working prompt and dataset, run optimization to improve it before going to production.
If your prompt’s scores decline (due to model changes, new edge cases, etc.), optimization can help recover.
As part of your release process, optimize prompts to ensure they’re performing at their best.
Schedule regular optimization runs to prevent gradual degradation and capture improvement opportunities.

Optimization vs Manual Tuning

AspectManual TuningAutomated Optimization
TimeHours to daysMinutes to hours
ConsistencySubjectiveObjective, metric-driven
CoverageLimited variations triedMany variations explored
ReproducibilityHard to reproduceFully tracked and reproducible
Expertise requiredHighLow (once dataset is ready)

Prerequisites for Optimization

Before running optimization, ensure you have:
1

A Prompt

The prompt you want to optimize, with defined variables
2

A Dataset

Test cases with inputs and expected outputs (20+ items recommended)
3

Configured Provider

An LLM provider set up for mutation and evaluation
4

Clear Goals

Know what “better” means - which metrics matter most

What’s Next?