Skip to main content

Python API Guide

The Riverscapes Data Exchange exposes a GraphQL API that can be queried from any language. For Python users, the Riverscapes Consortium provides a reference client library called pydex in the data-exchange-scripts repository. It handles authentication, pagination, and file downloads, so you can focus on your analysis.

:::tip When to use the Python API vs rscli Use rscli for downloading a handful of projects interactively. Use the Python API when you need to search across hundreds or thousands of projects, download files selectively, or integrate with a larger data pipeline. :::

Setup

Install pydex

pydex is not published to PyPI. Install it directly from GitHub:

pip install git+https://github.com/Riverscapes/data-exchange-scripts.git

Or add it to a pyproject.toml using uv:

[tool.uv.sources]
pydex = { git = "https://github.com/Riverscapes/data-exchange-scripts.git", branch = "main" }

Then add pydex to your dependencies and run uv sync.

Authentication

Interactive (browser-based)

The default authentication opens a browser window. This is the simplest approach for running scripts locally:

from pydex.classes.RiverscapesAPI import RiverscapesAPI

with RiverscapesAPI(stage='production') as api:
# A browser window will open to log in
# The token is automatically refreshed in the background
print("Authenticated successfully")

Always use the with statement. It handles token refresh and cleans up background threads when done.

Set RS_CLIENT_ID and RS_CLIENT_SECRET as environment variables. Contact support@riverscapes.freshdesk.com to obtain machine credentials for your organization.

Staging environment

To work against the staging environment instead of production:

with RiverscapesAPI(stage='staging') as api:
...

Searching for Projects

from pydex.classes.RiverscapesAPI import RiverscapesAPI
from pydex.classes.riverscapes_helpers import RiverscapesSearchParams

with RiverscapesAPI(stage='production') as api:
params = RiverscapesSearchParams({
'projectTypeId': 'vbet', # project type machine code
'tags': ['2025CONUS'], # all tags must match (AND)
'excludeArchived': True, # default; set False to include archived projects
})

for project, stats, total, progress_bar in api.search(params, progress_bar=True):
print(f"{project.id} {project.name} HUC: {project.huc}")

api.search() is a generator — results are streamed one at a time. It automatically handles ElasticSearch's 10,000-result pagination limit by splitting requests into date windows.

Search parameters

params = RiverscapesSearchParams({
'projectTypeId': 'rs_metric_engine', # project type ID
'tags': ['2025CONUS', 'BLM'], # AND — all tags must be present
'meta': { # AND — all key/value pairs must match
'Runner': 'Cybercastor',
'HUC': '1406', # prefix match supported
},
'createdOn': {
'from': '2025-01-01',
'to': '2025-12-31',
},
'updatedOn': {'from': '2025-06-01'},
'excludeArchived': True,
'bbox': [-120.0, 37.0, -114.0, 42.0], # [minLng, minLat, maxLng, maxLat]
'ownedBy': {
'id': 'a52b8094-7a1d-4171-955c-ad30ae935296',
'type': 'ORGANIZATION', # or 'USER'
},
})

Quick count

To count matching projects without downloading anything:

total, stats = api.search_count(params)
print(f"Found {total} projects")

Parallel processing

For CPU-bound processing of many projects, use the async variant which runs your callback in a thread pool:

def process_project(project, stats, total, progress_bar):
# This runs in a thread — keep it thread-safe
print(f"Processing {project.id}")

api.process_search_results_async(
callback=process_project,
search_params=params,
max_workers=4,
progress_bar=True,
)

Working with Project Objects

Each result yields a RiverscapesProject object:

project.id # Data Exchange GUID
project.name # project name
project.project_type # e.g. 'vbet', 'brat', 'rs_metric_engine'
project.tags # list of tag strings
project.huc # HUC code extracted from metadata (or None)
project.model_version # semver.VersionInfo object (or None)
project.archived # bool
project.visibility # 'PUBLIC', 'PRIVATE', or 'SECRET'
project.project_meta # dict of metadata key→value from project.rs.xml

Accessing metadata values:

# Case-sensitive
watershed = project.project_meta.get("Watershed")

# Case-insensitive (strips spaces/dashes/underscores from keys)
huc = project.project_meta_lwr.get("huc")
version = project.project_meta_lwr.get("modelversion")

Comparing model versions:

import semver

if project.model_version and project.model_version >= semver.VersionInfo.parse('4.0.0'):
print(f"New model: {project.id}")

Downloading Files

Download an entire project

api.download_files(
project_id=project.id,
download_dir='/local/data/my_project',
force=False, # skip files that haven't changed (ETag-based)
)

Download only matching files

Use a list of regex patterns to download only the files you need:

# Download only GeoPackages
api.download_files(
project_id=project.id,
download_dir='/local/data',
re_filter=[r'.*\.gpkg$'],
)

# Download a specific named file
api.download_files(
project_id=project.id,
download_dir='/local/data',
re_filter=[r'.*riverscapes_metrics\.gpkg'],
)

The project.rs.xml manifest is always downloaded regardless of the filter. Files that already exist locally and match the server ETag are skipped automatically.

Get a file listing without downloading

files = api.get_project_files(project.id)
for f in files:
print(f['localPath'], f['size'])

Complete Example — Batch Download

The following script downloads specific files from all VBET projects for a given HUC4:

import os
from pydex.classes.RiverscapesAPI import RiverscapesAPI
from pydex.classes.riverscapes_helpers import RiverscapesSearchParams

DOWNLOAD_DIR = "/data/vbet_projects"
HUC4_PREFIX = "1706"

os.makedirs(DOWNLOAD_DIR, exist_ok=True)

with RiverscapesAPI(stage='production') as api:
params = RiverscapesSearchParams({
'projectTypeId': 'vbet',
'meta': {'HUC': HUC4_PREFIX},
'excludeArchived': True,
})

total, _ = api.search_count(params)
print(f"Downloading files from {total} VBET projects for HUC4 {HUC4_PREFIX}")

for project, _, total, progress_bar in api.search(params, progress_bar=True):
if not project.huc:
continue

project_dir = os.path.join(DOWNLOAD_DIR, project.huc)
os.makedirs(project_dir, exist_ok=True)

api.download_files(
project_id=project.id,
download_dir=project_dir,
re_filter=[r'.*vbet\.gpkg$'], # only the main output GeoPackage
)

print("Done")

Handling Multiple Versions

The Data Exchange may contain multiple versions of the same project type for a given watershed. To always get the most recent version, compare model_version:

import semver

# Group projects by HUC
by_huc = {}
for project, _, _, _ in api.search(params):
if not project.huc or not project.model_version:
continue
huc = project.huc
if huc not in by_huc:
by_huc[huc] = project
elif project.model_version > by_huc[huc].model_version:
by_huc[huc] = project # keep only the highest version

# Now by_huc contains one project per HUC (the latest version)
for huc, project in by_huc.items():
print(f"{huc}: {project.id} v{project.model_version}")

Further Reading