Evaluations SDK
Create and manage prompt evaluations. Evaluations are definition entities that specify how to evaluate a prompt against a dataset. Results are produced by the optimization loop or the dashboard — there is no standalone “run evaluation” API from the SDK.List Evaluations
Retrieve a paginated list of evaluations. Filter bypromptId, promptGroupId, datasetId, name, or createdBy.
Request Parameters
| Parameter | Type | Description |
|---|---|---|
promptId | number | Filter by prompt ID |
promptGroupId | string | Filter by prompt group UUID |
datasetId | number | Filter by dataset ID |
name | string | Filter by exact evaluation name |
createdBy | string | Filter by creator email |
limit | number | Results per page (1-100) |
offset | number | Number of results to skip |
Create Evaluation
Create a new evaluation definition linking a prompt to a dataset with evaluation configuration.Request Body
| Field | Type | Required | Description |
|---|---|---|---|
promptId | number | Yes | ID of the prompt to evaluate |
datasetId | number | Yes | ID of the test dataset |
name | string | Yes | Human-readable name (max 255 chars) |
description | string | No | Evaluation purpose and methodology |
evalConfig | object | No | Metrics, thresholds, and evaluation parameters |
llmConfig | object | No | Model, temperature, and LLM execution settings |
tags | string[] | No | Organization tags |
metadata | object | No | Arbitrary metadata for tracking |
createdBy | string | No | Creator email |
Get Evaluation
Retrieve a single evaluation definition by its ID.Get Results
Retrieve the execution results for an evaluation. Results include the actual LLM output, pass/fail status, numeric score, and per-metric breakdowns.Poll for Completion
Since evaluations run asynchronously (via the optimization loop or dashboard), you can poll for results.Type Definitions
Method Reference
| Method | Description |
|---|---|
promptEvaluations.listEvaluations({ ...filters }) | List evaluations with pagination and filters |
promptEvaluations.createEvaluation({ ...body }) | Create evaluation definition |
promptEvaluations.getEvaluation({ id }) | Get evaluation by ID |
promptEvaluations.getEvaluationResult({ id }) | Get evaluation results |
promptEvaluations.runEvaluation({ id }) | Trigger evaluation run |
REST API Reference
The SDK methods map to these HTTP endpoints:| SDK Method | HTTP Endpoint |
|---|---|
listEvaluations | GET /api/prompts/evaluations |
createEvaluation | POST /api/prompts/evaluations |
getEvaluation | GET /api/prompts/evaluations/:id |
getEvaluationResult | GET /api/prompts/evaluations/:id/result |
runEvaluation | POST /api/prompts/evaluations/:id/run |