Evaluations SDK
Create and manage prompt evaluations. Evaluations are definition entities that specify how to evaluate a prompt against a dataset. Results are produced by running the evaluation or via the optimization loop.List Evaluations
Retrieve evaluations with optional filters:Filter parameters
| Parameter | Type | Description |
|---|---|---|
prompt_id | int | Filter by prompt ID |
prompt_group_id | str | Filter by prompt group UUID |
dataset_id | int | Filter by dataset ID |
name | str | Filter by evaluation name |
created_by | str | Filter by creator email |
is_latest | bool | Filter by latest-version flag |
limit | int | Results per page |
offset | int | Number of results to skip |
Create Evaluation
Create an evaluation definition linking a prompt to a dataset:PromptIdDatasetIdName fields
| Field | Type | Required | Description |
|---|---|---|---|
prompt_id | int | Yes | ID of the prompt to evaluate |
name | str | Yes | Human-readable name (max 255 chars) |
dataset_id | int | No | ID of the test dataset |
description | str | No | Evaluation purpose and methodology |
eval_config | Any | No | Metrics, thresholds, evaluation parameters |
llm_config | Any | No | Model, temperature, LLM execution settings |
tags | list[str] | No | Organization tags |
metadata | Any | No | Arbitrary metadata |
created_by | str | No | Creator email |
Get Evaluation
Update Evaluation
Delete Evaluation
Run Evaluation
Trigger an evaluation run:Get Results
Retrieve the execution results for an evaluation:Get Evaluation History
Create Evaluation Version
Poll for Completion
Since evaluations run asynchronously, poll for results:Async version
Method Reference
| Method | Description | Namespace |
|---|---|---|
list_evaluations(...) | List evaluations with filters | client.prompt_evaluations |
create_evaluation(body) | Create evaluation definition | client.prompt_evaluations |
get_evaluation(id_) | Get evaluation by ID | client.prompt_evaluations |
update_evaluation(id_, body) | Update evaluation | client.prompt_evaluations |
delete_evaluation(id_) | Delete evaluation | client.prompt_evaluations |
run_evaluation(id_) | Trigger evaluation run | client.prompt_evaluations |
get_evaluation_result(id_) | Get evaluation results | client.prompt_evaluations |
get_evaluation_history(id_) | Get evaluation run history | client.prompt_evaluations |
create_evaluation_version(id_) | Create new evaluation version | client.prompt_evaluations |
get_evaluation_results_aggregated(...) | Get results aggregated by version | client.prompt_evaluations |