Python API Guide

The Riverscapes Data Exchange exposes a GraphQL API that can be queried from any language. For Python users, the Riverscapes Consortium provides a reference client library called pydex in the data-exchange-scripts repository. It handles authentication, pagination, and file downloads, so you can focus on your analysis.

:::tip When to use the Python API vs rscli Use rscli for downloading a handful of projects interactively. Use the Python API when you need to search across hundreds or thousands of projects, download files selectively, or integrate with a larger data pipeline. :::

Setup

Install pydex

pydex is not published to PyPI. Install it directly from GitHub:

pip install git+https://github.com/Riverscapes/data-exchange-scripts.git

Or add it to a pyproject.toml using uv:

[tool.uv.sources]
pydex = { git = "https://github.com/Riverscapes/data-exchange-scripts.git", branch = "main" }

Then add pydex to your dependencies and run uv sync.

Authentication

Interactive (browser-based)

The default authentication opens a browser window. This is the simplest approach for running scripts locally:

from pydex.classes.RiverscapesAPI import RiverscapesAPI

with RiverscapesAPI(stage='production') as api:
    # A browser window will open to log in
    # The token is automatically refreshed in the background
    print("Authenticated successfully")

Always use the with statement. It handles token refresh and cleans up background threads when done.

Set RS_CLIENT_ID and RS_CLIENT_SECRET as environment variables. Contact support@riverscapes.freshdesk.com to obtain machine credentials for your organization.

Staging environment

To work against the staging environment instead of production:

with RiverscapesAPI(stage='staging') as api:
    ...

Searching for Projects

Basic search

from pydex.classes.RiverscapesAPI import RiverscapesAPI
from pydex.classes.riverscapes_helpers import RiverscapesSearchParams

with RiverscapesAPI(stage='production') as api:
    params = RiverscapesSearchParams({
        'projectTypeId': 'vbet',       # project type machine code
        'tags': ['2025CONUS'],         # all tags must match (AND)
        'excludeArchived': True,       # default; set False to include archived projects
    })

    for project, stats, total, progress_bar in api.search(params, progress_bar=True):
        print(f"{project.id}  {project.name}  HUC: {project.huc}")

api.search() is a generator — results are streamed one at a time. It automatically handles ElasticSearch's 10,000-result pagination limit by splitting requests into date windows.

Search parameters

params = RiverscapesSearchParams({
    'projectTypeId': 'rs_metric_engine',    # project type ID
    'tags': ['2025CONUS', 'BLM'],           # AND — all tags must be present
    'meta': {                               # AND — all key/value pairs must match
        'Runner': 'Cybercastor',
        'HUC': '1406',                      # prefix match supported
    },
    'createdOn': {
        'from': '2025-01-01',
        'to':   '2025-12-31',
    },
    'updatedOn': {'from': '2025-06-01'},
    'excludeArchived': True,
    'bbox': [-120.0, 37.0, -114.0, 42.0],  # [minLng, minLat, maxLng, maxLat]
    'ownedBy': {
        'id':   'a52b8094-7a1d-4171-955c-ad30ae935296',
        'type': 'ORGANIZATION',             # or 'USER'
    },
})

Quick count

To count matching projects without downloading anything:

total, stats = api.search_count(params)
print(f"Found {total} projects")

Parallel processing

For CPU-bound processing of many projects, use the async variant which runs your callback in a thread pool:

def process_project(project, stats, total, progress_bar):
    # This runs in a thread — keep it thread-safe
    print(f"Processing {project.id}")

api.process_search_results_async(
    callback=process_project,
    search_params=params,
    max_workers=4,
    progress_bar=True,
)

Working with Project Objects

Each result yields a RiverscapesProject object:

project.id              # Data Exchange GUID
project.name            # project name
project.project_type    # e.g. 'vbet', 'brat', 'rs_metric_engine'
project.tags            # list of tag strings
project.huc             # HUC code extracted from metadata (or None)
project.model_version   # semver.VersionInfo object (or None)
project.archived        # bool
project.visibility      # 'PUBLIC', 'PRIVATE', or 'SECRET'
project.project_meta    # dict of metadata key→value from project.rs.xml

Accessing metadata values:

# Case-sensitive
watershed = project.project_meta.get("Watershed")

# Case-insensitive (strips spaces/dashes/underscores from keys)
huc = project.project_meta_lwr.get("huc")
version = project.project_meta_lwr.get("modelversion")

Comparing model versions:

import semver

if project.model_version and project.model_version >= semver.VersionInfo.parse('4.0.0'):
    print(f"New model: {project.id}")

Downloading Files

Download an entire project

api.download_files(
    project_id=project.id,
    download_dir='/local/data/my_project',
    force=False,   # skip files that haven't changed (ETag-based)
)

Download only matching files

Use a list of regex patterns to download only the files you need:

# Download only GeoPackages
api.download_files(
    project_id=project.id,
    download_dir='/local/data',
    re_filter=[r'.*\.gpkg$'],
)

# Download a specific named file
api.download_files(
    project_id=project.id,
    download_dir='/local/data',
    re_filter=[r'.*riverscapes_metrics\.gpkg'],
)

The project.rs.xml manifest is always downloaded regardless of the filter. Files that already exist locally and match the server ETag are skipped automatically.

Get a file listing without downloading

files = api.get_project_files(project.id)
for f in files:
    print(f['localPath'], f['size'])

Complete Example — Batch Download

The following script downloads specific files from all VBET projects for a given HUC4:

import os
from pydex.classes.RiverscapesAPI import RiverscapesAPI
from pydex.classes.riverscapes_helpers import RiverscapesSearchParams

DOWNLOAD_DIR = "/data/vbet_projects"
HUC4_PREFIX = "1706"

os.makedirs(DOWNLOAD_DIR, exist_ok=True)

with RiverscapesAPI(stage='production') as api:
    params = RiverscapesSearchParams({
        'projectTypeId': 'vbet',
        'meta': {'HUC': HUC4_PREFIX},
        'excludeArchived': True,
    })

    total, _ = api.search_count(params)
    print(f"Downloading files from {total} VBET projects for HUC4 {HUC4_PREFIX}")

    for project, _, total, progress_bar in api.search(params, progress_bar=True):
        if not project.huc:
            continue

        project_dir = os.path.join(DOWNLOAD_DIR, project.huc)
        os.makedirs(project_dir, exist_ok=True)

        api.download_files(
            project_id=project.id,
            download_dir=project_dir,
            re_filter=[r'.*vbet\.gpkg$'],  # only the main output GeoPackage
        )

print("Done")

Handling Multiple Versions

The Data Exchange may contain multiple versions of the same project type for a given watershed. To always get the most recent version, compare model_version:

import semver

# Group projects by HUC
by_huc = {}
for project, _, _, _ in api.search(params):
    if not project.huc or not project.model_version:
        continue
    huc = project.huc
    if huc not in by_huc:
        by_huc[huc] = project
    elif project.model_version > by_huc[huc].model_version:
        by_huc[huc] = project  # keep only the highest version

# Now by_huc contains one project per HUC (the latest version)
for huc, project in by_huc.items():
    print(f"{huc}: {project.id} v{project.model_version}")