Skip to main content

Creating Datasets

Build comprehensive datasets to evaluate and optimize your prompts. Datasets are scoped to prompts and follow a two-step creation pattern: create the metadata, then bulk insert items.

CLI: One-Step Creation

The CLI combines both steps into a single command. It creates the dataset metadata and uploads items from a file or inline JSON in one call.
# Upload from a JSON array file
mutagent prompts dataset add <prompt-id> \
  --file dataset.json \
  --name "Customer Support Test Cases"
The --file and -d flags are mutually exclusive. Use one or the other. If no --name is provided, the CLI generates a timestamped name automatically.

SDK: Two-Step Creation

With the SDK, you first create the dataset metadata, then bulk insert items:
import { Mutagent } from '@mutagent/sdk';

const client = new Mutagent({ apiKey: process.env.MUTAGENT_API_KEY });

// Step 1: Create dataset metadata (scoped to a prompt)
const dataset = await client.prompt.createDatasetForPrompt({
  id: 123,  // prompt ID
  name: 'Customer Support Test Cases',
  description: 'Common support questions and expected answers for Q1 2026',
});

console.log('Dataset ID:', dataset.id);

// Step 2: Bulk insert items
await client.prompt.bulkCreateDatasetItems({
  id: dataset.id,
  items: [
    {
      input: {
        customer_name: 'John',
        question: 'How do I cancel my subscription?',
      },
      expectedOutput: {
        response: 'To cancel, go to Account > Subscription > Cancel.',
      },
      name: 'Cancel subscription - happy path',
      labels: ['billing', 'happy-path'],
      metadata: {
        source: 'support_tickets',
        difficulty: 'easy',
      },
    },
    {
      input: {
        customer_name: 'Sarah',
        question: 'Can I get a refund for last month?',
      },
      expectedOutput: {
        response: 'Refunds are available within 30 days of purchase.',
      },
      name: 'Refund request',
      labels: ['billing', 'edge-case'],
    },
  ],
});

API: Two-Step Creation

# Step 1: Create dataset metadata
curl -X POST https://api.mutagent.io/api/prompt/123/datasets \
  -H "x-api-key: mt_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Support Test Cases",
    "description": "Common support questions and expected answers"
  }'

# Step 2: Bulk insert items (use the dataset ID from step 1)
curl -X POST https://api.mutagent.io/api/prompts/datasets/456/items/bulk \
  -H "x-api-key: mt_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "input": {"question": "How do I cancel?"},
        "expectedOutput": {"response": "Go to Account > Subscription > Cancel."},
        "name": "Cancel subscription",
        "labels": ["billing"]
      }
    ]
  }'

Supported File Formats

JSON Array

A JSON file containing an array of item objects:
[
  {
    "input": {
      "customer_name": "Alice",
      "question": "How do I upgrade my account?"
    },
    "expectedOutput": {
      "response": "Visit Account Settings and click Upgrade Plan."
    },
    "name": "Upgrade account",
    "labels": ["account"],
    "metadata": {
      "priority": "high"
    }
  },
  {
    "input": {
      "customer_name": "Bob",
      "question": "Can I get a refund?"
    },
    "expectedOutput": {
      "response": "Refunds are available within 30 days of purchase."
    },
    "name": "Refund request",
    "labels": ["billing"]
  }
]

JSONL (Newline-Delimited JSON)

One JSON object per line — useful for large datasets:
{"input":{"question":"How do I upgrade?"},"expectedOutput":{"response":"Visit Account Settings."},"name":"Upgrade"}
{"input":{"question":"Can I get a refund?"},"expectedOutput":{"response":"Within 30 days."},"name":"Refund"}
{"input":{"question":"Reset password?"},"expectedOutput":{"response":"Click Forgot Password."},"name":"Password reset"}

CSV

Comma-separated values with a header row. The CLI passes CSV content directly to the API for parsing:
question,expected_answer,category
"How do I upgrade?","Visit Account Settings and click Upgrade Plan.",account
"Can I get a refund?","Refunds are available within 30 days.",billing
"Reset my password?","Click Forgot Password on the login page.",authentication

Dataset Item Structure

Each item in a dataset follows this structure:
{
  // Required
  input: Record<string, any>;           // Variable values matching prompt's inputSchema

  // Optional
  expectedOutput?: Record<string, any>;  // Expected LLM response (for scoring)
  name?: string;                         // Human-readable test case name
  userFeedback?: string;                 // Human feedback notes
  systemFeedback?: string;               // Automated feedback
  labels?: string[];                     // ML-style labels for categorization
  metadata?: Record<string, any>;        // Arbitrary metadata
}
Always include expectedOutput when you want to use the dataset for evaluation and optimization. Without expected outputs, only reference-free metrics (like coherence) can be used.

List Datasets

View all datasets associated with a prompt:
# List datasets for a prompt
mutagent prompts dataset list <prompt-id>

# Machine-readable output
mutagent prompts dataset list <prompt-id> --json

Remove a Dataset

mutagent prompts dataset delete <prompt-id> <dataset-id>

Clone Existing Dataset

Create a copy of an existing dataset, optionally targeting a different prompt:
curl -X POST https://api.mutagent.io/api/prompts/datasets/456/clone \
  -H "x-api-key: mt_xxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "targetPromptId": 789,
    "newName": "Support Cases v2"
  }'

Export Dataset

Export a dataset with all items for backup or analysis:
curl https://api.mutagent.io/api/prompts/datasets/456/export \
  -H "x-api-key: mt_xxxx" \
  -o dataset-export.json
The export includes:
  • Dataset metadata
  • Prompt group information (promptGroupId, latest version, total versions)
  • All items with their input/output data

Best Practices for Creation

Begin with 10-20 high-quality items and expand based on evaluation results.
Name both the dataset and individual items descriptively:
{
  "name": "Angry customer refund request",
  "input": {"question": "I want my money back NOW!"},
  "expectedOutput": {"response": "I understand your frustration..."}
}
Avoid generic names like “test-1” or “item-42”.
Datasets used with the optimizer require expectedOutput to measure quality. Always include expected outputs for datasets you plan to use in evaluation or optimization.
Labels help filter and organize test cases:
{
  "labels": ["edge-case", "billing", "regression-test"]
}
Ensure all items use the same variable names and data types as your prompt’s inputSchema.
Datasets are associated with a prompt’s promptGroupId. This means a dataset works across all versions of the same prompt, so you can test new versions against the same dataset.