Brain Path Specification

Universal address for multimodal cognitive data

This document specifies the structure, semantics, and usage of the brain path.

What is a brain path?

A brain path is a canonical address to access human brain data. The goal is to have an query schema that enables consistent and dynamic referencing across diverse modalities and datasets, and ultimately support implementing a large data lakehouse for human brain data that can be queried and accessed in a standard way.

A brain path has two layers:

Raw layer (native URIs): the original data, addressed by their own path (https:, s3:, file:). Raw data has no canonical structure, so it keeps the provider layout. It can be in any format and organization. The raw layer is the source of truth for provenance, and it is what gets accessed when you call .raw() on API objects.
Canonical layer (brain:// scheme): the logical identity of the data, organized by a controlled vocabulary and universal subject identifiers. The canonical layer is independent of where the bytes physically live. It provides a consistent way to reference data across datasets and modalities.

The query planner maps each brain:// path to its raw source(s) and the processing steps needed to derive the requested representation. It does this as an explicit staged pipeline (parse → normalize → resolve → expand → match → plan → bind → slice → return); see Query planning. The canonical layer is the primary interface for users and applications, while the raw layer is the source of truth for data access and provenance.

A canonical address has the form

brain://{catalog?}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

The {catalog} is the namespace that resolves the path, and it always occupies the authority position of the URI. In the bare-local form the authority is empty (brain:///...), which selects the default local catalog. To resolve the same address from a named or remote catalog, pair the scheme with a transport and put the catalog in the authority:

brain+https://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+s3://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+file://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

This pairs a logical scheme (brain) with a concrete transport (e.g. https). The path after the authority is identical in every form, and {subjects} is always its first segment. See Catalog and subject.

Examples:

Raw (native URI): https://openneuro.org/datasets/ds002158/snapshots/1.0.2/files/sub-102/...
Canonical (local): brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz
Canonical (all subjects, local): brain:///*/:fmri/:mni152/:bold/:rest/@*
Canonical (remote catalog): brain+https://omnibrain.org/hcp-100307/:fmri/:mni152/:bold/:rest/@*

Scheme and transport

Form	Resolves against
`brain:///...`	the default local catalog (empty authority)
`brain+https://{catalog}/...`	the named catalog over HTTPS
`brain+s3://{catalog}/...`	the named catalog over S3
`brain+file://{catalog}/...`	the named catalog over a local file store

The catalog is always the authority; the transport only says how to reach it. The same canonical content is reachable over several transports without changing its identity.

Raw data

Raw bytes keep the provider’s own locator, stored as a native URI:

https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...

A bare local path (/mnt/...) is normalized to file:///mnt/... so it is a valid URI.
An optional raw+https://... tag self-types a raw locator when it travels outside the catalog. Inside the catalog the namespace is already known, so raw is stored as its plain native URI.
Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.

Canonical structure

The canonical address is organized by universal subject identifiers:

brain://{catalog?}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

where:

{catalog?}: the resolving namespace, in the authority position; empty (brain:///...) selects the default local catalog. See Catalog and subject.
{subjects}: universal subject identifier (includes the dataset prefix). See Subject identifiers. This can be one or more subjects (e.g., hcp-100307, hcp-100307,hcp-100408) separated by commas.
the segments following {subjects} are controlled vocabulary terms that describe the data. They are separated by / and must be prefixed with : to indicate they are from the canonical vocabulary. Vocabulary terms are case-insensitive and normalized to lowercase (:MNI152 and :mni152 denote the same term). The required segments are:
- :modality: modality term, e.g., :fmri, :eeg
- :space: reference/registration space term, e.g., :mni152, :native
- :dtype: data type, e.g., :bold, :voltage

And the optional segments are:

{:qualifiers}: optional list of qualifiers, e.g., :denoised, :rest, :task
@coords: coordinate selector (use @* to request all), e.g., @xyz=32,45,12;t=0:1200 for a voxel time series or @ch=Cz for an EEG channel

Catalog and subject

The catalog and the subject answer two different questions, and the draft used to blur them:

{catalog} is where to resolve the path — the namespace/host that holds the index and the bytes. It is always the URI authority: empty for the default local catalog, or named for a curated/remote collection (brain+https://omnibrain.org/...). A catalog is an orthogonal grouping; it does not identify provenance.
{subjects} is what data — the dataset-prefixed universal id (hcp-100307). The dataset prefix inside the id (hcp) is what uniquely identifies the source dataset and provenance, regardless of which catalog serves it.

So the same subject can live in more than one catalog, and a single catalog can serve many datasets; the subject id stays stable across both.

The brain path uses these special elements:

Element	Meaning	Notes
`brain:`	Canonical scheme	Derived/processed data. The address is a valid URI.
`+{transport}`	Transport for a named/remote catalog	`brain+https://`, `brain+s3://`, `brain+file://`. Bare `brain:///` is the default local catalog.
`//{catalog}/`	Catalog (authority)	Namespace/host that resolves the path; empty (`///`) means the default local catalog.
`:`	Controlled vocabulary term	Segment must be valid in the canonical vocabulary (or resolvable to it).
`!`	Unresolved term	Placeholder for unknowns; queryable and later resolvable during ingestion. Replaces `~` (which collided with the shell/OS home-directory sigil) and `?` (reserved as the URI query delimiter).
`@`	Coordinate selector	Explicit selector for spatial/time/stream coordinates. See Coordinates.

Path segments

Segment	Required	Type	Description
`{catalog}`	optional	authority	Resolving namespace/host; empty selects the default local catalog
`{subjects}`	✅	string	Canonical subject ID (e.g., `hcp-100307`)
`:modality`	✅	vocab	Modality (e.g., `:fmri`, `:t1w`, `:eeg`)
`:space`	✅	vocab	Space (e.g., `:mni152`, `:native`)
`:dtype`	✅	vocab	Representation (e.g., `:bold`, `:intensity`, `:voltage`)
`{:qualifiers}`	optional	vocab list	Additional canonical qualifiers (task, processing, etc.)
`@coords`	optional	selector	Coordinate/stream selector (defaults to `@*`)

Subject identifiers

A deterministic canonical subject id is preferred to enable consistent referencing across datasets:

"{dataset_prefix}-{clean_id}"

{dataset_prefix}: canonical dataset code (e.g., hcp)
{clean_id}: dataset subject identifier normalized into a stable form

Example: hcp-100307

Qualifiers

Qualifiers are optional additional segments that provide more specific information about the data.

brain:///{subjects}/:modality/:space/:dtype/:qual1/:qual2/.../@coords

Typical qualifier families:

acquisition/condition: :rest, :task, :eyes-open, :eyes-closed
processing: :denoised, :filtered, :source-localized
feature forms: :parcellated, :roi-mean, :embedding

Coordinates

Coordinates represent spatial, temporal, and stream indexing. They are expressed using @... syntax at the end of the path. The selector keys (xyz, t, ch) follow the W3C Media Fragments URI convention, and the ; separator keeps a multi-axis selector inside a single path segment. The interpretation of the coordinates depends on the modality, space, and dtype.

The units of @xyz are fixed by :space: a standard space (e.g. :mni152) implies millimetres in that space’s reference frame, while :native implies the native voxel index of the subject’s own grid. t indexes time in samples/volumes (e.g. fMRI volume index, EEG sample).

Form	Meaning
`@*`	entire data
`@xyz=-42,38,12`	spatial point. Units defined by `:space` (`:mni152` ⇒ mm; `:native` ⇒ voxel index).
`@xyz=-42,38,12;t=0:1200`	spatial point + time range. `t` indexes time (fMRI volume index, EEG sample).
`@xyz=-42:40,30:50,10:20`	spatial bounding box (each axis given as `lo:hi`).
`@t=0:1200`	time range only (e.g., a whole-brain time series).
`@ch=Cz`	named stream selector (channel, parcel, or variable). Modality-dependent; maps to a named axis.

Canonical vs Raw

Aspect	Canonical (`brain:`)	Raw (native URI)
Scheme	`brain:` / `brain+{transport}:`	provider-native (`https:`, `s3:`, `file:`)
Structure	fixed schema	dataset-defined
Subject ID	deterministic universal ID	dataset convention
Vocabulary	enforced	none
Coordinates	explicit `@...` selector	none (native implied)
Role	logical identity, location-independent	physical bytes, where the provider put them

Examples

Canonical paths (local catalog):

brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@*
brain:///hcp-100307/:fmri/:mni152/:bold/:rest/:denoised/@xyz=-42,38,12;t=0:1200
brain:///hcp-100307/:t1w/:mni152/:intensity/@*
brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz
brain:///hcp-100307/:multimodal/:mni152/:embedding/:rest/@*
brain:///*/:fmri/:mni152/:bold/:rest/@*

The same content resolved from a remote catalog, over different transports:

brain+https://omnirest.xcit.org/hcp-100307/:fmri/:mni152/:bold/:rest/@*
brain+s3://omni-federation/hcp-100307/:fmri/:mni152/:bold/:rest/@*

The raw sources the hcp-100307 canonical paths resolve to (native URIs from the HCP dataset):

https://db.humanconnectome.org/data/projects/HCP_1200/subjects/100307/...
s3://hcp-openaccess/HCP_1200/100307/...
file:///mnt/xcit-h2/HCP-RawData/100307/...

Query patterns (API examples)

Assume an API with:

dataset.query(pattern: str) -> list[path]
dataset.get(path: str) -> object
object.raw

All resting fMRI in standard space (local)

dataset.query("brain:///*/:fmri/:mni152/:bold/:rest/@*")

All resting fMRI in a remote catalog

dataset.query("brain+https://omnirest.xcit.org/*/:fmri/:mni152/:bold/:rest/@*")

Specific voxel time series

dataset.get("brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@xyz=-42,38,12;t=0:1200")

Trace back provenance to the raw source

dataset.get("brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@*").raw
# -> "file:///mnt/xcit-h2/HCP-RawData/100307/..." (a native URI)

Query planning

Enter any brain:// path in the query planner demo and watch it parse into segments and operations. When there is a cache hit, the planner skip the upstream steps.

Unresolved terms (`!`)

Unknown or not-yet-mapped terms are prefixed with !. The ! sigil replaces the earlier ~ (which collided with the shell/OS home-directory sigil, making paths awkward to type and copy) and ? (reserved as the URI query delimiter).

Query: all paths containing unknown terms

dataset.query("brain:///*/!*")

Query: a specific unknown term across datasets

dataset.query("brain:///*/!weirdmodality")

Query: fully resolved paths only

dataset.query("brain:///*/:*/:*/:*/@*")

Validation

Scheme and transport: a canonical address must parse as a URI with scheme brain or brain+{transport}. Bare brain:///... denotes the default local catalog; brain+{transport}://{catalog}/... denotes a named catalog over that transport.
Raw locators: a raw source must be a valid URI in its native scheme (https:, s3:, file:). A bare path is normalized to file:///....
Canonical vocabulary (:): segments prefixed with : must be members of (or resolvable to) the canonical vocabulary.
Coordinates (@): coordinate selectors must be valid with respect to:

:modality (voxels, surface, streams)
:space (units, bounds)
:dtype (temporal indexing)

Reserved characters: the path must not contain a literal ? or # (reserved as the URI query and fragment delimiters); unresolved terms use !.
Use @* when requesting the entire object/stream.

Summary

brain:///... is the canonical, location-independent address; an empty authority is the default local catalog, and brain+{transport}://{catalog}/... resolves it from a named catalog. The catalog (where) and the subject’s dataset prefix (provenance) are independent.
Raw bytes keep their native URI (https:, s3:, file:); the catalog maps canonical paths to raw sources, and .raw returns the native URI.
: marks canonical terms, ! marks unresolved terms, and @ provides coordinate-aware indexing with Media Fragments-style keys.
The query planner turns a brain:// address into raw sources plus a derivation plan through an explicit nine-stage pipeline; see Query planning.