Skip to content

Add waterdata.get_combined_metadata for combined location + time-series inventory#264

Open
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs:add-combined-metadata
Open

Add waterdata.get_combined_metadata for combined location + time-series inventory#264
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs:add-combined-metadata

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Closes #263.

Summary

Wraps the Water Data API's combined-metadata collection, which joins the monitoring-locations catalog with the time-series-metadata catalog and returns one row per (location, parameter, statistic) inventory entry. Each row carries every column from both source endpoints, so any location attribute (state, HUC, site type, drainage area, well-construction depth, …) can be combined with any time-series attribute (parameter code, statistic, data type, period of record, …) in a single query — making this the most flexible "what data is available" endpoint in the API.

Mirrors R's read_waterdata_combined_meta.

Implementation note

The function is a thin parameter declaration plus a service / output_id pair ("combined-metadata", "combined_meta_id"), then a single call to the existing get_ogc_data(args, output_id, service). The lower-level helpers (_switch_arg_id, _switch_properties_id, _construct_api_requests, _walk_pages) are all already service-agnostic — they derive the wire-format id field and URL path from the service name — so no infrastructure changes were required.

The parameter list mirrors the live combined-metadata queryables endpoint (https://api.waterdata.usgs.gov/ogcapi/v0/collections/combined-metadata/queryables?f=json). The R signature includes computation_period_identifier, but it is not in this collection's queryables list (it is in time-series-metadata queryables), so it was omitted here.

Live API examples

from dataretrieval.waterdata import get_combined_metadata

# All time series and field measurements at a single site (R example)
df, md = get_combined_metadata(monitoring_location_id="USGS-05407000")
# 14 rows; columns include monitoring_location_id, parameter_code, data_type,
# begin, end, drainage_area, etc.

# Multi-site (forces POST path), filtered by parameter
df, md = get_combined_metadata(
    monitoring_location_id=["USGS-07069000", "USGS-07064000", "USGS-07068000"],
    parameter_code="00060",
)
# 12 rows across the 3 sites, all parameter_code == "00060"

# Inventory across multiple HUCs, restricted to streams and springs
df, md = get_combined_metadata(
    hydrologic_unit_code=["11010008", "11010009"],
    site_type=["Stream", "Spring"],
)

Test plan

  • test_get_combined_metadata — single-site live query; asserts presence of merged columns from both source catalogs (monitoring_location_id, parameter_code, data_type, drainage_area).
  • test_get_combined_metadata_multi_site_post — multi-site live query that forces the POST path; asserts the result respects the multi-value filter and the parameter filter.
  • All 30 tests/waterdata_test.py tests pass.

thodson-usgs and others added 2 commits May 5, 2026 14:54
…es inventory

Wraps the Water Data API's combined-metadata collection, which joins the
monitoring-locations catalog with the time-series-metadata catalog and
returns one row per (location, parameter, statistic) inventory entry.
Each row carries every column from both source endpoints, so any
location attribute (state, HUC, site type, drainage area, well depth,
...) can be combined with any time-series attribute (parameter code,
statistic, data type, period of record, ...) in a single query.

Mirrors R's read_waterdata_combined_meta. Implementation re-uses the
existing get_ogc_data infrastructure: the function is a thin parameter
declaration plus a service / output_id pair (combined-metadata,
combined_meta_id), since _switch_arg_id, _switch_properties_id,
_construct_api_requests, and _walk_pages are all already
service-agnostic.

Closes DOI-USGS#263.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- thresholds: int | None -> int | list[int] | None to match the
  docstring's "numeric or list of numbers" promise.
- Replace the backslash-line-continued multi-parameter docstring group
  with a short numpydoc-valid entry that documents the most-used
  location filters (state_name, county_name, hydrologic_unit_code,
  site_type, site_type_code) and points the reader at
  get_monitoring_locations for the long tail. The previous form was
  not valid numpydoc syntax, and a single 800+ char one-line group
  fails ruff E501.
- Drop a WHAT-narrating comment from test_get_combined_metadata; the
  assertions speak for themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new high-level Water Data API wrapper, waterdata.get_combined_metadata(...), exposing the Water Data API’s combined-metadata OGC collection to retrieve a merged inventory of monitoring-location and time-series metadata in a single query.

Changes:

  • Introduced get_combined_metadata in dataretrieval.waterdata.api (thin wrapper around existing get_ogc_data for the combined-metadata service).
  • Exported the new function from dataretrieval.waterdata for public use.
  • Added two live/integration tests and documented the addition in NEWS.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
dataretrieval/waterdata/api.py Adds get_combined_metadata wrapper function and its docstring/signature.
dataretrieval/waterdata/__init__.py Re-exports get_combined_metadata in the module public API.
tests/waterdata_test.py Adds live tests for single-site and multi-site (POST path) combined metadata queries.
NEWS.md Adds release-note entry describing the new function.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/waterdata_test.py
Comment thread dataretrieval/waterdata/api.py Outdated
Ported three additional examples from R's read_waterdata_combined_meta
that aren't redundant with the ones we already had:

- Groundwater well — surfaces water-level and aquifer columns that the
  surface-water example shows as nulls.
- State + county — common area-of-interest workflow.
- Two-step "inventory then fetch" chain — get_combined_metadata to find
  what's available in a HUC, then get_continuous to pull the actual
  observations at every site found.

Also corrected the data_type description: the live API returns
"Continuous values" and "Daily values" (with the word "values"), not
"Continuous" / "Daily" as the docstring previously claimed. Verified
against api.waterdata.usgs.gov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Widen thresholds: int | list[int] | None -> float | list[float] | None
  to match the docstring's "numeric or list of numbers" promise. The
  Water Data API treats threshold values as floats, so the previous
  int-only annotation was misleading downstream type-checked callers.
- In test_get_combined_metadata_multi_site_post, swap the unused `md`
  binding for `_` to match the convention used by the other live
  waterdata tests in this file (`df, _ = get_*(...)`). The companion
  test_get_combined_metadata still binds `md` because it asserts on
  metadata attributes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

waterdata.get_combined_metadata

2 participants