Add waterdata.get_samples_summary for per-location sample inventory by thodson-usgs · Pull Request #262 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-05-05T18:00:09Z

Closes #261.

Summary

Adds waterdata.get_samples_summary(monitoringLocationIdentifier=...) — a wrapper around the Samples database /summary/{monitoringLocationIdentifier} endpoint. The endpoint returns one row per (characteristic group, characteristic, user-supplied characteristic) combination with result and activity counts plus first / most recent activity dates, which makes it convenient for taking inventory of what discrete-sample data exists at a site before pulling the underlying observations with get_samples.

This mirrors the R package's summarize_waterdata_samples (read_waterdata_samples.R) feature requested in #261.

The Samples summary endpoint accepts only a single monitoring location per request, so the parameter is typed as str (not str | list[str]).

Live API example

from dataretrieval.waterdata import get_samples_summary

df, md = get_samples_summary(monitoringLocationIdentifier="USGS-04183500")

print(md.url)
# https://api.waterdata.usgs.gov/samples-data/summary/USGS-04183500?mimeType=text%2Fcsv

print(df.columns.tolist())
# ['monitoringLocationIdentifier', 'characteristicGroup', 'characteristic',
#  'characteristicUserSupplied', 'resultCount', 'activityCount',
#  'firstActivity', 'mostRecentActivity']

print(len(df))
# 110

Test plan

New test_mock_get_samples_summary covers the happy path against a recorded response (tests/data/samples_summary.txt): URL composition, column names, single-location filter.
Live verification: get_samples_summary(monitoringLocationIdentifier="USGS-04183500") returns 110 rows with the expected schema.
Full tests/waterdata_test.py suite (27 tests) passes.

Wraps the Samples database /summary/{monitoringLocationIdentifier} endpoint, mirroring the R package's summarize_waterdata_samples. Returns per-characteristic result and activity counts plus first / most recent activity dates for a single monitoring location — useful for taking inventory of what discrete-sample data exists at a site before pulling observations with get_samples. The Samples summary endpoint accepts only a single monitoring location per request, so the function takes a string (not a list). Closes DOI-USGS#261. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- URL-encode the path-segment monitoringLocationIdentifier so values containing /, ?, # or whitespace cannot break URL composition. - Log the resolved request URL via PreparedRequest, matching get_samples. - Loosen the test column assertion from exact-list to subset so a non-breaking server-side column addition does not flake the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a new public waterdata.get_samples_summary() helper to the waterdata module so users can inspect the discrete-sample inventory available for a single monitoring location before requesting full sample records. It fits the module’s role as the Python wrapper around modern USGS Water Data APIs and mirrors the corresponding R-package capability requested in #261.

Changes:

Added waterdata.get_samples_summary(monitoringLocationIdentifier=...) to wrap the Samples /summary/{monitoringLocationIdentifier} CSV endpoint.
Exported the new helper from dataretrieval.waterdata and added a recorded-response unit test plus fixture data.
Documented the new API addition in NEWS.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`dataretrieval/waterdata/api.py`	Adds the new Samples summary API wrapper and docstring.
`dataretrieval/waterdata/__init__.py`	Re-exports the new helper as part of the public `waterdata` API.
`tests/waterdata_test.py`	Adds a mock-based test for URL composition, metadata, and returned columns.
`tests/data/samples_summary.txt`	Provides recorded CSV fixture data for the new test.
`NEWS.md`	Announces the new `get_samples_summary` capability for users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ng for this endpoint Adapted the wording from R's summarize_waterdata_samples (in the develop branch of DOI-USGS/dataRetrieval) to match the Python module's docstring style. Picked up the variety-of-agencies example IDs from the R doc. Two claims from the R doc were corrected rather than copied: - The R doc says "Location identifiers should be separated with commas" with a multi-ID example. That contradicts the function's own one-site check and is wrong for the summary service (which accepts exactly one ID). Dropped. - The R doc says "Location numbers without an agency prefix are assumed to have the prefix USGS." That's not true for this endpoint at the API level — bare IDs return an empty result with a different column shape. Documented the actual behavior instead. Also switched the example to USGS-04074950 (the site used by the R doc's example) so the two repos line up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Reject non-str monitoringLocationIdentifier with a TypeError that explains the constraint, instead of letting urllib.parse.quote raise a low-level TypeError. This matches R's summarize_waterdata_samples, which guards with `if (length(monitoringLocationIdentifier) > 1) stop(...)`. - Restore characteristicUserSupplied in the column-subset assertion; /simplify's "loosen exact-list to subset" was applied too aggressively and dropped a real schema column that disambiguates grouping. - Add a regression test that a list input raises the new TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ldecicco-USGS

Looks good. I'm not entirely sure I follow the "quote" discussion, but seems imply it's the easiest way to test for 1 monitoring location id.

thodson-usgs · 2026-05-05T19:35:58Z

@ldecicco-USGS Thanks for the review! Quick clarification on the quote discussion since the Copilot phrasing was a bit indirect: urllib.parse.quote(monitoringLocationIdentifier, safe='') is just URL-path-segment escaping — it percent-encodes any /, ?, #, or whitespace so user input can't break URL composition. It doesn't validate the shape of the input. The single-site enforcement is the separate isinstance(..., str) guard a few lines above, which raises TypeError("...accepts exactly one monitoring location per request...") for a list (the symptom Copilot was actually pointing at). The two are independent — quote is for URL safety, the type check is for the API constraint.

thodson-usgs and others added 2 commits May 5, 2026 12:59

thodson-usgs requested a review from Copilot May 5, 2026 18:03

Copilot started reviewing on behalf of thodson-usgs May 5, 2026 18:04 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/api.py

Comment thread tests/waterdata_test.py

thodson-usgs and others added 2 commits May 5, 2026 13:10

thodson-usgs requested a review from ldecicco-USGS May 5, 2026 18:16

ldecicco-USGS approved these changes May 5, 2026

View reviewed changes

thodson-usgs merged commit 6df40f5 into DOI-USGS:main May 5, 2026
8 checks passed

thodson-usgs mentioned this pull request May 5, 2026

Fix summarize_waterdata_samples doc claims that contradict the service DOI-USGS/dataRetrieval#889

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add waterdata.get_samples_summary for per-location sample inventory#262

Add waterdata.get_samples_summary for per-location sample inventory#262
thodson-usgs merged 4 commits intoDOI-USGS:mainfrom
thodson-usgs:add-samples-summary

thodson-usgs commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

ldecicco-USGS left a comment

Uh oh!

thodson-usgs commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thodson-usgs commented May 5, 2026

Summary

Live API example

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

ldecicco-USGS left a comment

Choose a reason for hiding this comment

Uh oh!

thodson-usgs commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants