feat(evaluation): add DatasetClient and dataset management service provider by jariy17 · Pull Request #491 · aws/bedrock-agentcore-sdk-python

jariy17 · 2026-05-21T14:38:24Z

Summary

Adds DatasetClient, a high-level wrapper for Bedrock AgentCore dataset management operations (create/get/list/delete datasets and versions, upload/download via presigned URLs, polling helpers).
Adds DatasetManagementServiceProvider (formerly ServiceDatasetProvider) that loads evaluation scenarios from a managed dataset, alongside the existing FileDatasetProvider which now also supports JSONL files.
Refactors Scenario classes so each owns its schema_type, and moves _parse_scenario to module level for reuse between file- and service-backed providers.
Streams JSONL downloads, extracts region from agent ARN (no separate BEDROCK_TEST_REGION), and consolidates test fixtures.
Unit and integration tests for the new client and provider, plus a runner integ test that exercises a real agent.

Test plan

uv run pytest tests/bedrock_agentcore/evaluation passes
uv run pytest tests_integ/evaluation passes against a configured AWS account
Lint/format checks pass (ruff, line-length)
Manual: create dataset → upload JSONL → run evaluation via DatasetManagementServiceProvider end-to-end

Add Dataset Management SDK support with: - DatasetClient: pass-through client for all 11 dataset APIs (create/get/list/update/delete datasets, versions, examples) with 6 _and_wait helpers for async operations - ServiceDatasetProvider: fetches datasets from the service and returns SDK Dataset objects compatible with OnDemandEvaluationDatasetRunner and BatchEvaluationRunner - Unit tests (20 tests) and integration tests (17 tests, verified against live AWS)

Switch ServiceDatasetProvider from list_dataset_examples pagination to downloading the JSONL file via the presigned downloadUrl from GetDataset. Single HTTP request is simpler and faster for large datasets.

- helpers.py: get_or_create_agent_runtime(), make_agent_invoker() with retry logic and warmup for cold start handling - test_runners_with_service_dataset.py: OnDemandRunner + ServiceDatasetProvider end-to-end test (skipped until a working deployed agent is available — current account has 30s init timeout that prevents cold starts)

Update test_runners_with_service_dataset.py to use env-var config: - INTEG_AGENT_RUNTIME_ARN: skips if not set - BEDROCK_TEST_REGION: region matching the agent - Verified end-to-end: ServiceDatasetProvider → OnDemandRunner → real agent invocation → COMPLETED

…ION needed

- ServiceDatasetProvider: accept client in __init__ (eliminates region_name) - ServiceDatasetProvider: validate schemaType against supported runner schemas - ServiceDatasetProvider: proper error message on download failure - Remove helpers.py (not needed) - Add unit tests for unsupported schema and download failure cases

- ServiceDatasetProvider: import DatasetClient at top, default in __init__ - ScenarioExecutor: add schema_type field, override in Predefined/Simulated - ServiceDatasetProvider: collect supported schemas dynamically from executors - delete_dataset_and_wait: add DELETE_FAILED as failed status

…dule level - Add schema_type field to Scenario base, PredefinedScenario, SimulatedScenario - Remove schema_type from ScenarioExecutor (doesn't belong there) - Move _parse_scenario from FileDatasetProvider to module-level function - SUPPORTED_SCHEMA_TYPES derived from Scenario classes directly

- Add timeout=60 to requests.get() for presigned URL download (#1) - Use r.content.decode("utf-8") instead of r.text for explicit encoding (#3) - Replace model_fields introspection with plain constant set (#8) - Guard __getattr__ against recursion when _cp_client not initialized (#5) - Remove dead _mock_client function from tests (#11)

- Stream JSONL via iter_lines() instead of loading entire file into memory (#2) - Consolidate repetitive test mock setup with pytest fixtures (#12)

…entServiceProvider Addresses PR review feedback: the name makes explicit which service the provider loads datasets from (Dataset Management Service).

Dispatch on file extension: paths ending in .jsonl are read line-by-line (one scenario per line). All other paths keep the existing {"scenarios": [...]} JSON shape. Adds 8 unit tests covering predefined/simulated/mixed JSONL content, blank-line tolerance, malformed lines, and extension dispatch.

…d_wait A version-specific delete (datasetVersion provided) does not remove the dataset itself — it transitions the dataset to UPDATING and back to ACTIVE. The previous waiter polled for ResourceNotFoundException and timed out. Branch on whether datasetVersion is passed: - Without datasetVersion: poll until ResourceNotFoundException (DELETE_FAILED) - With datasetVersion: poll until ACTIVE (UPDATE_FAILED), return dataset dict Add unit tests for both version-delete paths (success + UPDATE_FAILED) and an integ test that creates two versions, deletes the oldest via delete_dataset_and_wait, and verifies the dataset stays ACTIVE.

github-actions · 2026-05-21T14:38:44Z

✅ No Breaking Changes Detected

No public API breaking changes found in this PR.

Hweinstock

LGTM!

jariy17 added 18 commits May 21, 2026 10:25

refactor(evaluation): use downloadUrl for ServiceDatasetProvider

71d0f02

Switch ServiceDatasetProvider from list_dataset_examples pagination to downloading the JSONL file via the presigned downloadUrl from GetDataset. Single HTTP request is simpler and faster for large datasets.

refactor: extract region from agent ARN, no separate BEDROCK_TEST_REG…

43b70d9

…ION needed

style: fix lint and formatting

625190d

style: fix remaining line-too-long issues

a7c784d

fix: remove unused variable assignment (F841)

94860c1

fix: remove unused _DATASET_FAILED_STATUSES constant

ff0c85b

fix(evaluation): stream JSONL download and consolidate test fixtures

85fc638

- Stream JSONL via iter_lines() instead of loading entire file into memory (#2) - Consolidate repetitive test mock setup with pytest fixtures (#12)

refactor(evaluation): rename ServiceDatasetProvider to DatasetManagem…

f057e6b

…entServiceProvider Addresses PR review feedback: the name makes explicit which service the provider loads datasets from (Dataset Management Service).

fix: Removed delete_dataset_version api from allowlist

51f1055

jariy17 requested a review from a team May 21, 2026 14:38

jariy17 had a problem deploying to auto-approve May 21, 2026 14:38 — with GitHub Actions Failure

jariy17 temporarily deployed to auto-approve May 21, 2026 14:38 — with GitHub Actions Inactive

Hweinstock approved these changes May 21, 2026

View reviewed changes

Comment thread src/bedrock_agentcore/evaluation/runner/dataset_providers.py

jariy17 had a problem deploying to auto-approve May 22, 2026 13:03 — with GitHub Actions Failure

jariy17 enabled auto-merge (squash) May 22, 2026 13:41

Merge branch 'main' into feat/dataset-management-sdk

f6cbde3

jariy17 temporarily deployed to auto-approve May 22, 2026 13:41 — with GitHub Actions Inactive

jariy17 had a problem deploying to auto-approve May 22, 2026 13:41 — with GitHub Actions Failure

jariy17 temporarily deployed to auto-approve May 22, 2026 13:41 — with GitHub Actions Inactive

jariy17 merged commit 29287c2 into main May 22, 2026
34 of 36 checks passed

pratapmn mentioned this pull request May 24, 2026

[Bug] 1.11.0 fails to import on fresh install: bedrock_agentcore.evaluation triggers ModuleNotFoundError: No module named 'requests' #496

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): add DatasetClient and dataset management service provider#491

feat(evaluation): add DatasetClient and dataset management service provider#491
jariy17 merged 19 commits into
mainfrom
feat/dataset-management-sdk

jariy17 commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Hweinstock left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jariy17 commented May 21, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ No Breaking Changes Detected

Uh oh!

Hweinstock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 21, 2026 •

edited

Loading