Brain Path Specification
Universal address for multimodal cognitive data
This document specifies the structure, semantics, and usage of the brain path.
What is a brain path?
A brain path is a canonical address to access human brain data (including imaging and cognitive recordings). The goal is to have an addressing schema that enables consistent and dynamic referencing across diverse modalities and datasets, and ultimately support implementing a large data lakehouse for human brain data that can be queried and accessed in a standard way.
A brain path has two layers:
- Canonical layer (
brain:scheme): the value-add address. A controlled vocabulary, universal subject identifiers, and coordinate selectors, all independent of where the bytes physically live. - Raw layer (native URIs): the original provider bytes, addressed by their own locator (
https:,s3:,file:). Raw data has no canonical structure, so it keeps the provider layout.
The catalog maps each canonical brain: path to its raw source(s), so provenance (.raw) returns a native URI.
A canonical address has the form
brain:///{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
when resolved against the local catalog (empty authority, the file:// pattern). To resolve the same address from a remote federation node, pair the scheme with a transport and name the node in the authority:
brain+https://{node}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+s3://{node}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
This follows the git+https convention, where a logical scheme is paired with a concrete transport. The path after the authority is identical in every form, and the subject is always its first segment. There is no derived namespace segment: the brain scheme already means canonical.
For example:
- Canonical (local):
brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz - Canonical (all subjects, local):
brain:///*/:fmri/:MNI152/:bold/:rest/@* - Canonical (remote node):
brain+https://omnirest.xcit.org/hcp-100307/:fmri/:MNI152/:bold/:rest/@* - Raw (native URI):
https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/sub-102:...
Scheme and transport
| Form | Resolves against |
|---|---|
brain:///... |
the local catalog (empty authority) |
brain+https://{node}/... |
a remote node over HTTPS |
brain+s3://{node}/... |
a remote node over S3 |
brain+file://{node}/... |
a node-local file store |
The bare brain: scheme is the logical identity; the +transport suffix only says how to reach a resolver. The same canonical content is reachable over several transports without changing its identity.
Parsing note: RFC 3986 parsers (e.g. Python
urllib.parse) extract the{node}host for abrain+https://authority. Strict WHATWG/browser URL parsers only parse the authority for a fixed special-scheme list and treat a+-scheme as opaque, so browser-side code needs a custom split. The primary consumer is Python, so this is acceptable.
Raw data
Raw bytes keep the provider’s own locator, stored as a native URI:
https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...
- A bare local path (
/mnt/...) is normalized tofile:///mnt/...so it is a valid URI. - An optional
raw+https://...tag self-types a raw locator when it travels outside the catalog. Inside the catalog the namespace is already known, so raw is stored as its plain native URI. - Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.
Canonical structure
The canonical address is organized by universal subject identifiers:
brain:///{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
where:
{subjects}: universal subject identifier (may include dataset identifier). See Subject identifiers. This can be one or more subjects (e.g.,hcp-100307,hcp-100307,hcp-100408) separated by commas.the segments following
{subjects}are controlled vocabulary terms that describe the data. They are separated by/and must be prefixed with:to indicate they are from the canonical vocabulary. The required segments are::modality: modality term, e.g.,:fmri,:eeg:space: reference/registration space term, e.g.,:mni152,:native:dtype: data type, e.g.,:bold,:voltage
And the optional segments are:
{:qualifiers}: optional list of qualifiers, e.g.,:denoised,:rest,:task@coords: coordinate selector (use@*to request all), e.g.,@xyz=32,45,12;t=0:1200for a voxel time series or@ch=Czfor an EEG channel
The brain path uses these special elements:
| Element | Meaning | Notes |
|---|---|---|
brain: |
Canonical scheme | Derived/processed data. The address is a valid URI. |
+{transport} |
Transport for a remote node | brain+https://, brain+s3://, brain+file://. Bare brain:/// is the local catalog. |
//{node}/ |
Authority (node) | Federation node that resolves the path; empty (///) means local. |
: |
Controlled vocabulary term | Segment must be valid in the canonical vocabulary (or resolvable to it). |
~ |
Unresolved term | Placeholder for unknowns; queryable and later resolvable during ingestion. Replaces ?, which is reserved as the URI query delimiter. |
@ |
Coordinate selector | Explicit selector for spatial/time/stream coordinates. See Coordinates. |
Path segments
| Segment | Required | Type | Description |
|---|---|---|---|
{subjects} |
✅ | string | Canonical subject ID (e.g., hcp-100307) |
:modality |
✅ | vocab | Modality (e.g., :fmri, :t1w, :eeg) |
:space |
✅ | vocab | Space (e.g., :mni152, :native) |
:dtype |
✅ | vocab | Representation (e.g., :bold, :intensity, :voltage) |
{:qualifiers} |
optional | vocab list | Additional canonical qualifiers (task, processing, etc.) |
@coords |
optional | selector | Coordinate/stream selector (defaults to @*) |
Subject identifiers
A deterministic canonical subject id is preferred to enable consistent referencing across datasets:
"{dataset_prefix}-{clean_id}"
{dataset_prefix}: canonical dataset code (e.g.,hcp){clean_id}: dataset subject identifier normalized into a stable form
Example: hcp-100307
Qualifiers
Qualifiers are optional additional segments that provide more specific information about the data.
brain:///{subjects}/:modality/:space/:dtype/:qual1/:qual2/.../@coords
Typical qualifier families:
- acquisition/condition:
:rest,:task,:eyes-open,:eyes-closed - processing:
:denoised,:filtered,:source-localized - feature forms:
:parcellated,:roi-mean,:embedding
Coordinates
Coordinates represent spatial, temporal, and stream indexing. They are expressed using @... syntax at the end of the path. The selector keys (xyz, t, ch) follow the W3C Media Fragments URI convention, and the ; separator keeps a multi-axis selector inside a single path segment. The interpretation of the coordinates depends on the modality, space, and dtype.
| Form | Meaning |
|---|---|
@* |
entire data |
@xyz=-42,38,12 |
spatial point. Interpretation of x,y,z is defined by :space (e.g., :MNI152 implies standard-space mm). |
@xyz=-42,38,12;t=0:1200 |
spatial point + time range. t indexes time (e.g., fMRI volume index, EEG sample). |
@xyz=-42:40,30:50,10:20 |
spatial bounding box (each axis given as lo:hi). |
@t=0:1200 |
time range only (e.g., a whole-brain time series). |
@ch=Cz |
named stream selector (channel, parcel, or variable). Modality-dependent; maps to a named axis. |
Canonical vs Raw
| Aspect | Canonical (brain:) |
Raw (native URI) |
|---|---|---|
| Scheme | brain: / brain+{transport}: |
provider-native (https:, s3:, file:) |
| Structure | fixed schema | dataset-defined |
| Subject ID | deterministic universal ID | dataset convention |
| Vocabulary | enforced | none |
| Coordinates | explicit @... selector |
none (native implied) |
| Role | logical identity, location-independent | physical bytes, where the provider put them |
Examples
Canonical paths (local node):
brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@*
brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/:denoised/@xyz=32,45,12;t=0:1200
brain:///hcp-100307/:t1w/:MNI152/:intensity/@*
brain:///hcp-100307/:eeg/:MNI152/:voltage/:rest/:source-localized/@ch=Cz
brain:///hcp-100307/:multimodal/:MNI152/:embedding/:rest/@*
brain:///*/:fmri/:MNI152/:bold/:rest/@*
The same content resolved from a remote federation node, over different transports:
brain+https://omnirest.xcit.org/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
brain+s3://omni-federation/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
The raw sources those canonical paths resolve to (native URIs):
https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/sub-102:...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...
Query patterns (API examples)
Assume an API with:
dataset.query(pattern: str) -> list[path]dataset.get(path: str) -> objectobject.raw
All resting fMRI in standard space (local)
dataset.query("brain:///*/:fmri/:MNI152/:bold/:rest/@*")All resting fMRI on a remote node
dataset.query("brain+https://omnirest.xcit.org/*/:fmri/:MNI152/:bold/:rest/@*")Specific voxel time series
dataset.get("brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@xyz=-42,38,12;t=0:1200")Trace back provenance to the raw source
dataset.get("brain:///hcp-100307/:fmri/:MNI152/:bold/:rest/@*").raw
# -> "file:///mnt/xcit-h2/.../sub-102/..." (a native URI)Unresolved terms (~)
Unknown or not-yet-mapped terms are prefixed with ~. (The earlier ? sigil was retired because ? is reserved as the URI query delimiter.)
Query: all paths containing unknown terms
dataset.query("brain:///*/~*")Query: a specific unknown term across datasets
dataset.query("brain:///*/~weirdmodality")Query: fully resolved paths only
dataset.query("brain:///*/:*/:*/:*/@*")Validation
Scheme and transport: a canonical address must parse as a URI with scheme
brainorbrain+{transport}. Barebrain:///...denotes the local catalog;brain+{transport}://{node}/...denotes a remote node over that transport.Raw locators: a raw source must be a valid URI in its native scheme (
https:,s3:,file:). A bare path is normalized tofile:///....Canonical vocabulary (
:): segments prefixed with:must be members of (or resolvable to) the canonical vocabulary.Coordinates (
@): coordinate selectors must be valid with respect to:
:modality(voxels, surface, streams):space(units, bounds):dtype(temporal indexing)
Reserved characters: the path must not contain a literal
?or#(reserved as the URI query and fragment delimiters); unresolved terms use~.Use
@*when requesting the entire object/stream.
Summary
brain:///...is the canonical, location-independent address; an empty authority is the local catalog, andbrain+{transport}://{node}/...resolves it from a remote node.- Raw bytes keep their native URI (
https:,s3:,file:); the catalog maps canonical paths to raw sources, and.rawreturns the native URI. :marks canonical terms,~marks unresolved terms, and@provides coordinate-aware indexing with Media Fragments-style keys.