Brain Path Specification

Universal address for multimodal cognitive data

This document specifies the structure, semantics, and usage of the brain path.

What is a brain path?

A brain path is a canonical address to access human brain data (including imaging and cognitive recordings). The goal is to have an addressing schema that enables consistent and dynamic referencing across diverse modalities and datasets, and ultimately support implementing a large data lakehouse for human brain data that can be queried and accessed in a standard way.

A brain path has two layers:

  • Canonical layer (brain: scheme): the value-add address. A controlled vocabulary, universal subject identifiers, and coordinate selectors, all independent of where the bytes physically live.
  • Raw layer (native URIs): the original provider bytes, addressed by their own locator (https:, s3:, file:). Raw data has no canonical structure, so it keeps the provider layout.

The catalog maps each canonical brain: path to its raw source(s), so provenance (.raw) returns a native URI.

A canonical address has the form

brain:///{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

when resolved against the local catalog (empty authority, the file:// pattern). To resolve the same address from a remote federation node, pair the scheme with a transport and name the node in the authority:

brain+https://{node}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+s3://{node}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

This follows the git+https convention, where a logical scheme is paired with a concrete transport. The path after the authority is identical in every form, and the subject is always its first segment. There is no derived namespace segment: the brain scheme already means canonical.

For example:

  • Canonical (local): brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz
  • Canonical (all subjects, local): brain:///*/:fmri/:MNI152/:bold/:rest/@*
  • Canonical (remote node): brain+https://omnirest.xcit.org/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
  • Raw (native URI): https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/sub-102:...

Scheme and transport

Form Resolves against
brain:///... the local catalog (empty authority)
brain+https://{node}/... a remote node over HTTPS
brain+s3://{node}/... a remote node over S3
brain+file://{node}/... a node-local file store

The bare brain: scheme is the logical identity; the +transport suffix only says how to reach a resolver. The same canonical content is reachable over several transports without changing its identity.

Parsing note: RFC 3986 parsers (e.g. Python urllib.parse) extract the {node} host for a brain+https:// authority. Strict WHATWG/browser URL parsers only parse the authority for a fixed special-scheme list and treat a +-scheme as opaque, so browser-side code needs a custom split. The primary consumer is Python, so this is acceptable.

Raw data

Raw bytes keep the provider’s own locator, stored as a native URI:

https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...
  • A bare local path (/mnt/...) is normalized to file:///mnt/... so it is a valid URI.
  • An optional raw+https://... tag self-types a raw locator when it travels outside the catalog. Inside the catalog the namespace is already known, so raw is stored as its plain native URI.
  • Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.

Canonical structure

The canonical address is organized by universal subject identifiers:

brain:///{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

where:

  • {subjects}: universal subject identifier (may include dataset identifier). See Subject identifiers. This can be one or more subjects (e.g., hcp-100307, hcp-100307,hcp-100408) separated by commas.

  • the segments following {subjects} are controlled vocabulary terms that describe the data. They are separated by / and must be prefixed with : to indicate they are from the canonical vocabulary. The required segments are:

    • :modality: modality term, e.g., :fmri, :eeg
    • :space: reference/registration space term, e.g., :mni152, :native
    • :dtype: data type, e.g., :bold, :voltage

And the optional segments are:

  • {:qualifiers}: optional list of qualifiers, e.g., :denoised, :rest, :task
  • @coords: coordinate selector (use @* to request all), e.g., @xyz=32,45,12;t=0:1200 for a voxel time series or @ch=Cz for an EEG channel

The brain path uses these special elements:

Element Meaning Notes
brain: Canonical scheme Derived/processed data. The address is a valid URI.
+{transport} Transport for a remote node brain+https://, brain+s3://, brain+file://. Bare brain:/// is the local catalog.
//{node}/ Authority (node) Federation node that resolves the path; empty (///) means local.
: Controlled vocabulary term Segment must be valid in the canonical vocabulary (or resolvable to it).
~ Unresolved term Placeholder for unknowns; queryable and later resolvable during ingestion. Replaces ?, which is reserved as the URI query delimiter.
@ Coordinate selector Explicit selector for spatial/time/stream coordinates. See Coordinates.

Path segments

Segment Required Type Description
{subjects} string Canonical subject ID (e.g., hcp-100307)
:modality vocab Modality (e.g., :fmri, :t1w, :eeg)
:space vocab Space (e.g., :mni152, :native)
:dtype vocab Representation (e.g., :bold, :intensity, :voltage)
{:qualifiers} optional vocab list Additional canonical qualifiers (task, processing, etc.)
@coords optional selector Coordinate/stream selector (defaults to @*)

Subject identifiers

A deterministic canonical subject id is preferred to enable consistent referencing across datasets:

"{dataset_prefix}-{clean_id}"
  • {dataset_prefix}: canonical dataset code (e.g., hcp)
  • {clean_id}: dataset subject identifier normalized into a stable form

Example: hcp-100307

Qualifiers

Qualifiers are optional additional segments that provide more specific information about the data.

brain:///{subjects}/:modality/:space/:dtype/:qual1/:qual2/.../@coords

Typical qualifier families:

  • acquisition/condition: :rest, :task, :eyes-open, :eyes-closed
  • processing: :denoised, :filtered, :source-localized
  • feature forms: :parcellated, :roi-mean, :embedding

Coordinates

Coordinates represent spatial, temporal, and stream indexing. They are expressed using @... syntax at the end of the path. The selector keys (xyz, t, ch) follow the W3C Media Fragments URI convention, and the ; separator keeps a multi-axis selector inside a single path segment. The interpretation of the coordinates depends on the modality, space, and dtype.

Form Meaning
@* entire data
@xyz=-42,38,12 spatial point. Interpretation of x,y,z is defined by :space (e.g., :MNI152 implies standard-space mm).
@xyz=-42,38,12;t=0:1200 spatial point + time range. t indexes time (e.g., fMRI volume index, EEG sample).
@xyz=-42:40,30:50,10:20 spatial bounding box (each axis given as lo:hi).
@t=0:1200 time range only (e.g., a whole-brain time series).
@ch=Cz named stream selector (channel, parcel, or variable). Modality-dependent; maps to a named axis.

Canonical vs Raw

Aspect Canonical (brain:) Raw (native URI)
Scheme brain: / brain+{transport}: provider-native (https:, s3:, file:)
Structure fixed schema dataset-defined
Subject ID deterministic universal ID dataset convention
Vocabulary enforced none
Coordinates explicit @... selector none (native implied)
Role logical identity, location-independent physical bytes, where the provider put them

Examples

Canonical paths (local node):

brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@*
brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/:denoised/@xyz=32,45,12;t=0:1200
brain:///hcp-100307/:t1w/:MNI152/:intensity/@*
brain:///hcp-100307/:eeg/:MNI152/:voltage/:rest/:source-localized/@ch=Cz
brain:///hcp-100307/:multimodal/:MNI152/:embedding/:rest/@*
brain:///*/:fmri/:MNI152/:bold/:rest/@*

The same content resolved from a remote federation node, over different transports:

brain+https://omnirest.xcit.org/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
brain+s3://omni-federation/hcp-100307/:fmri/:MNI152/:bold/:rest/@*

The raw sources those canonical paths resolve to (native URIs):

https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/sub-102:...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...

Query patterns (API examples)

Assume an API with:

  • dataset.query(pattern: str) -> list[path]
  • dataset.get(path: str) -> object
  • object.raw

All resting fMRI in standard space (local)

dataset.query("brain:///*/:fmri/:MNI152/:bold/:rest/@*")

All resting fMRI on a remote node

dataset.query("brain+https://omnirest.xcit.org/*/:fmri/:MNI152/:bold/:rest/@*")

Specific voxel time series

dataset.get("brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@xyz=-42,38,12;t=0:1200")

Trace back provenance to the raw source

dataset.get("brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@*").raw
# -> "file:///mnt/xcit-h2/.../sub-102/..." (a native URI)

Unresolved terms (~)

Unknown or not-yet-mapped terms are prefixed with ~. (The earlier ? sigil was retired because ? is reserved as the URI query delimiter.)

Query: all paths containing unknown terms

dataset.query("brain:///*/~*")

Query: a specific unknown term across datasets

dataset.query("brain:///*/~weirdmodality")

Query: fully resolved paths only

dataset.query("brain:///*/:*/:*/:*/@*")

Validation

  1. Scheme and transport: a canonical address must parse as a URI with scheme brain or brain+{transport}. Bare brain:///... denotes the local catalog; brain+{transport}://{node}/... denotes a remote node over that transport.

  2. Raw locators: a raw source must be a valid URI in its native scheme (https:, s3:, file:). A bare path is normalized to file:///....

  3. Canonical vocabulary (:): segments prefixed with : must be members of (or resolvable to) the canonical vocabulary.

  4. Coordinates (@): coordinate selectors must be valid with respect to:

  • :modality (voxels, surface, streams)
  • :space (units, bounds)
  • :dtype (temporal indexing)
  1. Reserved characters: the path must not contain a literal ? or # (reserved as the URI query and fragment delimiters); unresolved terms use ~.

  2. Use @* when requesting the entire object/stream.

Summary

  • brain:///... is the canonical, location-independent address; an empty authority is the local catalog, and brain+{transport}://{node}/... resolves it from a remote node.
  • Raw bytes keep their native URI (https:, s3:, file:); the catalog maps canonical paths to raw sources, and .raw returns the native URI.
  • : marks canonical terms, ~ marks unresolved terms, and @ provides coordinate-aware indexing with Media Fragments-style keys.