Skip to content

Conversation

@matjanos
Copy link

@matjanos matjanos commented Nov 27, 2025

Currently, genkit's evaluation system runs inference sequentially for all test cases in bulkRunAction(). For large datasets (e.g., 150+ test cases), this causes extremely slow evaluation times as each flow/model execution must complete before the next one starts.

  • Parallelized inference using the existing batchSize to run samples
    concurrently (capped at 100) while preserving ordering, per-sample error
    capture, and progress logging.
  • Evaluator actions now execute in parallel to match inference concurrency.
  • eval:flow continues to use --batchSize to control concurrency; eval:run
    behavior is unchanged. Example: genkit eval:flow myFlow data.json
    --batchSize 5 now runs both inference and evaluation in parallel batches.

Checklist (if applicable):

@google-cla
Copy link

google-cla bot commented Nov 27, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@matjanos matjanos marked this pull request as ready for review November 27, 2025 16:45
@github-actions github-actions bot added the docs Improvements or additions to documentation label Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation js tooling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant