Add waterdata.get_samples_summary for per-location sample inventory#262
Add waterdata.get_samples_summary for per-location sample inventory#262thodson-usgs merged 4 commits intoDOI-USGS:mainfrom
Conversation
Wraps the Samples database /summary/{monitoringLocationIdentifier}
endpoint, mirroring the R package's summarize_waterdata_samples. Returns
per-characteristic result and activity counts plus first / most recent
activity dates for a single monitoring location — useful for taking
inventory of what discrete-sample data exists at a site before pulling
observations with get_samples.
The Samples summary endpoint accepts only a single monitoring location
per request, so the function takes a string (not a list).
Closes DOI-USGS#261.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- URL-encode the path-segment monitoringLocationIdentifier so values containing /, ?, # or whitespace cannot break URL composition. - Log the resolved request URL via PreparedRequest, matching get_samples. - Loosen the test column assertion from exact-list to subset so a non-breaking server-side column addition does not flake the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new public waterdata.get_samples_summary() helper to the waterdata module so users can inspect the discrete-sample inventory available for a single monitoring location before requesting full sample records. It fits the module’s role as the Python wrapper around modern USGS Water Data APIs and mirrors the corresponding R-package capability requested in #261.
Changes:
- Added
waterdata.get_samples_summary(monitoringLocationIdentifier=...)to wrap the Samples/summary/{monitoringLocationIdentifier}CSV endpoint. - Exported the new helper from
dataretrieval.waterdataand added a recorded-response unit test plus fixture data. - Documented the new API addition in
NEWS.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
dataretrieval/waterdata/api.py |
Adds the new Samples summary API wrapper and docstring. |
dataretrieval/waterdata/__init__.py |
Re-exports the new helper as part of the public waterdata API. |
tests/waterdata_test.py |
Adds a mock-based test for URL composition, metadata, and returned columns. |
tests/data/samples_summary.txt |
Provides recorded CSV fixture data for the new test. |
NEWS.md |
Announces the new get_samples_summary capability for users. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ng for this endpoint Adapted the wording from R's summarize_waterdata_samples (in the develop branch of DOI-USGS/dataRetrieval) to match the Python module's docstring style. Picked up the variety-of-agencies example IDs from the R doc. Two claims from the R doc were corrected rather than copied: - The R doc says "Location identifiers should be separated with commas" with a multi-ID example. That contradicts the function's own one-site check and is wrong for the summary service (which accepts exactly one ID). Dropped. - The R doc says "Location numbers without an agency prefix are assumed to have the prefix USGS." That's not true for this endpoint at the API level — bare IDs return an empty result with a different column shape. Documented the actual behavior instead. Also switched the example to USGS-04074950 (the site used by the R doc's example) so the two repos line up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reject non-str monitoringLocationIdentifier with a TypeError that explains the constraint, instead of letting urllib.parse.quote raise a low-level TypeError. This matches R's summarize_waterdata_samples, which guards with `if (length(monitoringLocationIdentifier) > 1) stop(...)`. - Restore characteristicUserSupplied in the column-subset assertion; /simplify's "loosen exact-list to subset" was applied too aggressively and dropped a real schema column that disambiguates grouping. - Add a regression test that a list input raises the new TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ldecicco-USGS
left a comment
There was a problem hiding this comment.
Looks good. I'm not entirely sure I follow the "quote" discussion, but seems imply it's the easiest way to test for 1 monitoring location id.
|
@ldecicco-USGS Thanks for the review! Quick clarification on the |
Closes #261.
Summary
Adds
waterdata.get_samples_summary(monitoringLocationIdentifier=...)— a wrapper around the Samples database/summary/{monitoringLocationIdentifier}endpoint. The endpoint returns one row per (characteristic group, characteristic, user-supplied characteristic) combination with result and activity counts plus first / most recent activity dates, which makes it convenient for taking inventory of what discrete-sample data exists at a site before pulling the underlying observations withget_samples.This mirrors the R package's
summarize_waterdata_samples(read_waterdata_samples.R) feature requested in #261.The Samples summary endpoint accepts only a single monitoring location per request, so the parameter is typed as
str(notstr | list[str]).Live API example
Test plan
test_mock_get_samples_summarycovers the happy path against a recorded response (tests/data/samples_summary.txt): URL composition, column names, single-location filter.get_samples_summary(monitoringLocationIdentifier="USGS-04183500")returns 110 rows with the expected schema.tests/waterdata_test.pysuite (27 tests) passes.