feat(evaluation): add DatasetClient and dataset management service provider by jariy17 · Pull Request #490 · aws/bedrock-agentcore-sdk-python

jariy17 · 2026-05-21T14:26:18Z

Summary

Adds DatasetClient, a high-level wrapper for Bedrock AgentCore dataset management operations (create/get/list/delete datasets and versions, upload/download via presigned URLs, polling helpers).
Adds DatasetManagementServiceProvider (formerly ServiceDatasetProvider) that loads evaluation scenarios from a managed dataset, alongside the existing FileDatasetProvider which now also supports JSONL files.
Refactors Scenario classes so each owns its schema_type, and moves _parse_scenario to module level for reuse between file- and service-backed providers.
Streams JSONL downloads, extracts region from agent ARN (no separate BEDROCK_TEST_REGION), and consolidates test fixtures.
Unit and integration tests for the new client and provider, plus a runner integ test that exercises a real agent.

Test plan

uv run pytest tests/bedrock_agentcore/evaluation passes
uv run pytest tests_integ/evaluation passes against a configured AWS account
Lint/format checks pass (ruff, line-length)
Manual: create dataset → upload JSONL → run evaluation via DatasetManagementServiceProvider end-to-end

Add Dataset Management SDK support with: - DatasetClient: pass-through client for all 11 dataset APIs (create/get/list/update/delete datasets, versions, examples) with 6 _and_wait helpers for async operations - ServiceDatasetProvider: fetches datasets from the service and returns SDK Dataset objects compatible with OnDemandEvaluationDatasetRunner and BatchEvaluationRunner - Unit tests (20 tests) and integration tests (17 tests, verified against live AWS)

Switch ServiceDatasetProvider from list_dataset_examples pagination to downloading the JSONL file via the presigned downloadUrl from GetDataset. Single HTTP request is simpler and faster for large datasets.

- helpers.py: get_or_create_agent_runtime(), make_agent_invoker() with retry logic and warmup for cold start handling - test_runners_with_service_dataset.py: OnDemandRunner + ServiceDatasetProvider end-to-end test (skipped until a working deployed agent is available — current account has 30s init timeout that prevents cold starts)

Update test_runners_with_service_dataset.py to use env-var config: - INTEG_AGENT_RUNTIME_ARN: skips if not set - BEDROCK_TEST_REGION: region matching the agent - Verified end-to-end: ServiceDatasetProvider → OnDemandRunner → real agent invocation → COMPLETED

…ION needed

- ServiceDatasetProvider: accept client in __init__ (eliminates region_name) - ServiceDatasetProvider: validate schemaType against supported runner schemas - ServiceDatasetProvider: proper error message on download failure - Remove helpers.py (not needed) - Add unit tests for unsupported schema and download failure cases

- ServiceDatasetProvider: import DatasetClient at top, default in __init__ - ScenarioExecutor: add schema_type field, override in Predefined/Simulated - ServiceDatasetProvider: collect supported schemas dynamically from executors - delete_dataset_and_wait: add DELETE_FAILED as failed status

…dule level - Add schema_type field to Scenario base, PredefinedScenario, SimulatedScenario - Remove schema_type from ScenarioExecutor (doesn't belong there) - Move _parse_scenario from FileDatasetProvider to module-level function - SUPPORTED_SCHEMA_TYPES derived from Scenario classes directly

- Add timeout=60 to requests.get() for presigned URL download (#1) - Use r.content.decode("utf-8") instead of r.text for explicit encoding (#3) - Replace model_fields introspection with plain constant set (#8) - Guard __getattr__ against recursion when _cp_client not initialized (#5) - Remove dead _mock_client function from tests (aws#11)

- Stream JSONL via iter_lines() instead of loading entire file into memory (#2) - Consolidate repetitive test mock setup with pytest fixtures (aws#12)

…entServiceProvider Addresses PR review feedback: the name makes explicit which service the provider loads datasets from (Dataset Management Service).

Dispatch on file extension: paths ending in .jsonl are read line-by-line (one scenario per line). All other paths keep the existing {"scenarios": [...]} JSON shape. Adds 8 unit tests covering predefined/simulated/mixed JSONL content, blank-line tolerance, malformed lines, and extension dispatch.

…d_wait A version-specific delete (datasetVersion provided) does not remove the dataset itself — it transitions the dataset to UPDATING and back to ACTIVE. The previous waiter polled for ResourceNotFoundException and timed out. Branch on whether datasetVersion is passed: - Without datasetVersion: poll until ResourceNotFoundException (DELETE_FAILED) - With datasetVersion: poll until ACTIVE (UPDATE_FAILED), return dataset dict Add unit tests for both version-delete paths (success + UPDATE_FAILED) and an integ test that creates two versions, deletes the oldest via delete_dataset_and_wait, and verifies the dataset stays ACTIVE.

codecov-commenter · 2026-05-21T14:27:51Z

Codecov Report

❌ Patch coverage is 97.11538% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@c311682). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/bedrock_agentcore/evaluation/dataset_client.py	96.36%	1 Missing and 1 partial ⚠️
...k_agentcore/evaluation/runner/dataset_providers.py	97.77%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #490   +/-   ##
=======================================
  Coverage        ?   89.49%           
=======================================
  Files           ?       84           
  Lines           ?     7732           
  Branches        ?     1157           
=======================================
  Hits            ?     6920           
  Misses          ?      515           
  Partials        ?      297

Flag	Coverage Δ
unittests	`89.49% <97.11%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jariy17 · 2026-05-21T14:38:11Z

Re-opening as an in-repo PR to fix the fork-PR token permission issue (breaking-change check failed only on the comment-post step due to GitHub's read-only token policy for fork PRs).

jariy17 added 18 commits May 21, 2026 10:25

refactor(evaluation): use downloadUrl for ServiceDatasetProvider

71d0f02

Switch ServiceDatasetProvider from list_dataset_examples pagination to downloading the JSONL file via the presigned downloadUrl from GetDataset. Single HTTP request is simpler and faster for large datasets.

refactor: extract region from agent ARN, no separate BEDROCK_TEST_REG…

43b70d9

…ION needed

style: fix lint and formatting

625190d

style: fix remaining line-too-long issues

a7c784d

fix: remove unused variable assignment (F841)

94860c1

fix: remove unused _DATASET_FAILED_STATUSES constant

ff0c85b

fix(evaluation): stream JSONL download and consolidate test fixtures

85fc638

- Stream JSONL via iter_lines() instead of loading entire file into memory (#2) - Consolidate repetitive test mock setup with pytest fixtures (aws#12)

refactor(evaluation): rename ServiceDatasetProvider to DatasetManagem…

f057e6b

…entServiceProvider Addresses PR review feedback: the name makes explicit which service the provider loads datasets from (Dataset Management Service).

fix: Removed delete_dataset_version api from allowlist

51f1055

jariy17 requested a review from a team May 21, 2026 14:26

jariy17 temporarily deployed to auto-approve May 21, 2026 14:26 — with GitHub Actions Inactive

jariy17 had a problem deploying to auto-approve May 21, 2026 14:26 — with GitHub Actions Failure

jariy17 temporarily deployed to auto-approve May 21, 2026 14:26 — with GitHub Actions Inactive

jariy17 closed this May 21, 2026

jariy17 had a problem deploying to auto-approve May 21, 2026 14:38 — with GitHub Actions Failure

jariy17 temporarily deployed to auto-approve May 21, 2026 14:38 — with GitHub Actions Inactive

jariy17 had a problem deploying to auto-approve May 22, 2026 13:03 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): add DatasetClient and dataset management service provider#490

feat(evaluation): add DatasetClient and dataset management service provider#490
jariy17 wants to merge 18 commits into
aws:mainfrom
jariy17:feat/dataset-management-sdk

jariy17 commented May 21, 2026

Uh oh!

codecov-commenter commented May 21, 2026

Uh oh!

jariy17 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jariy17 commented May 21, 2026

Summary

Test plan

Uh oh!

codecov-commenter commented May 21, 2026

Codecov Report

Uh oh!

jariy17 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants