flowchart TD A["1. Parse<br/>address → AST"] --> B["2. Normalize<br/>AST → canonical AST"] B --> C["3. Resolve vocabulary<br/>bind :terms; mark ! and *"] C --> D["4. Expand<br/>wildcards/lists → candidate paths"] D --> E["5. Match catalog<br/>artifact vs. raw + recipe"] E --> F["6. Plan derivation<br/>raw → steps → canonical DAG"] F --> G["7. Bind raw sources<br/>attach native URIs (.raw)"] G --> H["8. Apply coordinates<br/>push @coords as lazy slice"] H --> I["9. Return / materialize<br/>query: handles · get: object"]
Brain Path Specification
Universal address for multimodal cognitive data
This document specifies the structure, semantics, and usage of the brain path.
What is a brain path?
A brain path is a canonical address to access human brain data. The goal is to have an query schema that enables consistent and dynamic referencing across diverse modalities and datasets, and ultimately support implementing a large data lakehouse for human brain data that can be queried and accessed in a standard way.
A brain path has two layers:
Raw layer (native URIs): the original data, addressed by their own path (
https:,s3:,file:). Raw data has no canonical structure, so it keeps the provider layout. It can be in any format and organization. The raw layer is the source of truth for provenance, and it is what gets accessed when you call.raw()on API objects.Canonical layer (
brain://scheme): the logical identity of the data, organized by a controlled vocabulary and universal subject identifiers. The canonical layer is independent of where the bytes physically live. It provides a consistent way to reference data across datasets and modalities.
The query planner maps each brain:// path to its raw source(s) and the processing steps needed to derive the requested representation. It does this as an explicit staged pipeline (parse → normalize → resolve → expand → match → plan → bind → slice → return); see Query planning. The canonical layer is the primary interface for users and applications, while the raw layer is the source of truth for data access and provenance.
A canonical address has the form
brain://{catalog?}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
The {catalog} is the namespace that resolves the path, and it always occupies the authority position of the URI. In the bare-local form the authority is empty (brain:///...), which selects the default local catalog. To resolve the same address from a named or remote catalog, pair the scheme with a transport and put the catalog in the authority:
brain+https://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+s3://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
brain+file://{catalog}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
This pairs a logical scheme (brain) with a concrete transport (e.g. https). The path after the authority is identical in every form, and {subjects} is always its first segment. See Catalog and subject.
Examples:
- Raw (native URI):
https://openneuro.org/datasets/ds002158/snapshots/1.0.2/files/sub-102/... - Canonical (local):
brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz - Canonical (all subjects, local):
brain:///*/:fmri/:mni152/:bold/:rest/@* - Canonical (remote catalog):
brain+https://omnibrain.org/hcp-100307/:fmri/:mni152/:bold/:rest/@*
Scheme and transport
| Form | Resolves against |
|---|---|
brain:///... |
the default local catalog (empty authority) |
brain+https://{catalog}/... |
the named catalog over HTTPS |
brain+s3://{catalog}/... |
the named catalog over S3 |
brain+file://{catalog}/... |
the named catalog over a local file store |
The catalog is always the authority; the transport only says how to reach it. The same canonical content is reachable over several transports without changing its identity.
Raw data
Raw bytes keep the provider’s own locator, stored as a native URI:
https://openneuro.org/crn/datasets/ds002158/snapshots/1.0.2/files/...
s3://openneuro.org/ds002158/sub-102/...
file:///mnt/xcit-h2/ABIDE2-RawData/sub-102/...
- A bare local path (
/mnt/...) is normalized tofile:///mnt/...so it is a valid URI. - An optional
raw+https://...tag self-types a raw locator when it travels outside the catalog. Inside the catalog the namespace is already known, so raw is stored as its plain native URI. - Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.
Canonical structure
The canonical address is organized by universal subject identifiers:
brain://{catalog?}/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords
where:
{catalog?}: the resolving namespace, in the authority position; empty (brain:///...) selects the default local catalog. See Catalog and subject.{subjects}: universal subject identifier (includes the dataset prefix). See Subject identifiers. This can be one or more subjects (e.g.,hcp-100307,hcp-100307,hcp-100408) separated by commas.the segments following
{subjects}are controlled vocabulary terms that describe the data. They are separated by/and must be prefixed with:to indicate they are from the canonical vocabulary. Vocabulary terms are case-insensitive and normalized to lowercase (:MNI152and:mni152denote the same term). The required segments are::modality: modality term, e.g.,:fmri,:eeg:space: reference/registration space term, e.g.,:mni152,:native:dtype: data type, e.g.,:bold,:voltage
And the optional segments are:
{:qualifiers}: optional list of qualifiers, e.g.,:denoised,:rest,:task@coords: coordinate selector (use@*to request all), e.g.,@xyz=32,45,12;t=0:1200for a voxel time series or@ch=Czfor an EEG channel
Catalog and subject
The catalog and the subject answer two different questions, and the draft used to blur them:
{catalog}is where to resolve the path — the namespace/host that holds the index and the bytes. It is always the URI authority: empty for the default local catalog, or named for a curated/remote collection (brain+https://omnibrain.org/...). A catalog is an orthogonal grouping; it does not identify provenance.{subjects}is what data — the dataset-prefixed universal id (hcp-100307). The dataset prefix inside the id (hcp) is what uniquely identifies the source dataset and provenance, regardless of which catalog serves it.
So the same subject can live in more than one catalog, and a single catalog can serve many datasets; the subject id stays stable across both.
The brain path uses these special elements:
| Element | Meaning | Notes |
|---|---|---|
brain: |
Canonical scheme | Derived/processed data. The address is a valid URI. |
+{transport} |
Transport for a named/remote catalog | brain+https://, brain+s3://, brain+file://. Bare brain:/// is the default local catalog. |
//{catalog}/ |
Catalog (authority) | Namespace/host that resolves the path; empty (///) means the default local catalog. |
: |
Controlled vocabulary term | Segment must be valid in the canonical vocabulary (or resolvable to it). |
! |
Unresolved term | Placeholder for unknowns; queryable and later resolvable during ingestion. Replaces ~ (which collided with the shell/OS home-directory sigil) and ? (reserved as the URI query delimiter). |
@ |
Coordinate selector | Explicit selector for spatial/time/stream coordinates. See Coordinates. |
Path segments
| Segment | Required | Type | Description |
|---|---|---|---|
{catalog} |
optional | authority | Resolving namespace/host; empty selects the default local catalog |
{subjects} |
✅ | string | Canonical subject ID (e.g., hcp-100307) |
:modality |
✅ | vocab | Modality (e.g., :fmri, :t1w, :eeg) |
:space |
✅ | vocab | Space (e.g., :mni152, :native) |
:dtype |
✅ | vocab | Representation (e.g., :bold, :intensity, :voltage) |
{:qualifiers} |
optional | vocab list | Additional canonical qualifiers (task, processing, etc.) |
@coords |
optional | selector | Coordinate/stream selector (defaults to @*) |
Subject identifiers
A deterministic canonical subject id is preferred to enable consistent referencing across datasets:
"{dataset_prefix}-{clean_id}"
{dataset_prefix}: canonical dataset code (e.g.,hcp){clean_id}: dataset subject identifier normalized into a stable form
Example: hcp-100307
Qualifiers
Qualifiers are optional additional segments that provide more specific information about the data.
brain:///{subjects}/:modality/:space/:dtype/:qual1/:qual2/.../@coords
Typical qualifier families:
- acquisition/condition:
:rest,:task,:eyes-open,:eyes-closed - processing:
:denoised,:filtered,:source-localized - feature forms:
:parcellated,:roi-mean,:embedding
Coordinates
Coordinates represent spatial, temporal, and stream indexing. They are expressed using @... syntax at the end of the path. The selector keys (xyz, t, ch) follow the W3C Media Fragments URI convention, and the ; separator keeps a multi-axis selector inside a single path segment. The interpretation of the coordinates depends on the modality, space, and dtype.
The units of @xyz are fixed by :space: a standard space (e.g. :mni152) implies millimetres in that space’s reference frame, while :native implies the native voxel index of the subject’s own grid. t indexes time in samples/volumes (e.g. fMRI volume index, EEG sample).
| Form | Meaning |
|---|---|
@* |
entire data |
@xyz=-42,38,12 |
spatial point. Units defined by :space (:mni152 ⇒ mm; :native ⇒ voxel index). |
@xyz=-42,38,12;t=0:1200 |
spatial point + time range. t indexes time (fMRI volume index, EEG sample). |
@xyz=-42:40,30:50,10:20 |
spatial bounding box (each axis given as lo:hi). |
@t=0:1200 |
time range only (e.g., a whole-brain time series). |
@ch=Cz |
named stream selector (channel, parcel, or variable). Modality-dependent; maps to a named axis. |
Canonical vs Raw
| Aspect | Canonical (brain:) |
Raw (native URI) |
|---|---|---|
| Scheme | brain: / brain+{transport}: |
provider-native (https:, s3:, file:) |
| Structure | fixed schema | dataset-defined |
| Subject ID | deterministic universal ID | dataset convention |
| Vocabulary | enforced | none |
| Coordinates | explicit @... selector |
none (native implied) |
| Role | logical identity, location-independent | physical bytes, where the provider put them |
Examples
Canonical paths (local catalog):
brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@*
brain:///hcp-100307/:fmri/:mni152/:bold/:rest/:denoised/@xyz=-42,38,12;t=0:1200
brain:///hcp-100307/:t1w/:mni152/:intensity/@*
brain:///hcp-100307/:eeg/:native/:voltage/:rest/@ch=Cz
brain:///hcp-100307/:multimodal/:mni152/:embedding/:rest/@*
brain:///*/:fmri/:mni152/:bold/:rest/@*
The same content resolved from a remote catalog, over different transports:
brain+https://omnirest.xcit.org/hcp-100307/:fmri/:mni152/:bold/:rest/@*
brain+s3://omni-federation/hcp-100307/:fmri/:mni152/:bold/:rest/@*
The raw sources the hcp-100307 canonical paths resolve to (native URIs from the HCP dataset):
https://db.humanconnectome.org/data/projects/HCP_1200/subjects/100307/...
s3://hcp-openaccess/HCP_1200/100307/...
file:///mnt/xcit-h2/HCP-RawData/100307/...
Query patterns (API examples)
Assume an API with:
dataset.query(pattern: str) -> list[path]dataset.get(path: str) -> objectobject.raw
All resting fMRI in standard space (local)
dataset.query("brain:///*/:fmri/:mni152/:bold/:rest/@*")All resting fMRI in a remote catalog
dataset.query("brain+https://omnirest.xcit.org/*/:fmri/:mni152/:bold/:rest/@*")Specific voxel time series
dataset.get("brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@xyz=-42,38,12;t=0:1200")Trace back provenance to the raw source
dataset.get("brain:///hcp-100307/:fmri/:mni152/:bold/:rest/@*").raw
# -> "file:///mnt/xcit-h2/HCP-RawData/100307/..." (a native URI)Query planning
A brain:// address is not resolved in one step. The query planner breaks it into an explicit pipeline of nine stages, each with a typed input and output. dataset.query() runs stages 1–8 and returns lazy path handles; dataset.get() runs the same stages for a single path and then executes the plan to return the object.
- Parse — lex the address into typed components
{scheme, transport, catalog, subjects[], modality, space, dtype, qualifiers[], coords}. Reject a literal?or#. Output: an AST. - Normalize — lowercase vocabulary terms, canonicalize subject ids, normalize bare/
file:raw paths, and default a missing@coordsto@*. Output: a canonical AST. - Resolve vocabulary — bind each
:termto the controlled vocabulary; leave!term(unresolved) and*(wildcard) as open holes. Output: AST annotatedresolved | unresolved | wildcardper segment. - Expand — turn wildcards, multi-subject lists, and catalog scope into a concrete candidate set by matching against the catalog index (the
datasets.yml-style inventory). Output: a list of fully-ground candidate canonical paths. - Match catalog - for each candidate, look up whether a materialized derivative already exists (e.g. an fMRIPrep output), a partial derivative does (preprocessed but not yet denoised) that can seed the plan, or only raw bytes plus a derivation recipe. Reusing an existing derivative instead of recomputing is a cache hit - the same reuse a SQL optimizer does with a materialized view. Output:
(path, derivative | partial | recipe)triples. - Plan derivation - for non-materialized paths, assemble the processing DAG (raw -> steps -> canonical) from a registry of typed transforms. Each transform declares its precondition (the representation it consumes), effect (what it produces), cost, and implementation; the planner selects the transforms whose effects reach the requested
:space/:dtype/:qualifiersand orders them by their preconditions (e.g. registration to:mni152precedes denoising, because denoising consumes data already in its analysis space), rather than following a hand-written per-modality script. Step order is therefore derived, not fixed. Output: a per-path plan DAG. - Bind raw sources — attach the native URIs each plan reads from; this is exactly what
.rawreturns. Output: plan + provenance. - Apply coordinates — push the
@coordsselector down as a lazy slice on the resolved artifact (voxel/surface/stream + time), validated against:modality/:space/:dtype. Output: a sliced lazy plan. - Return / materialize —
query()returns the list of resolved path handles (lazy plans);get()executes one plan and returns the object.
Worked trace
Take brain:///hcp-100307/:fmri/:mni152/:bold/:rest/:denoised/@xyz=-42,38,12;t=0:1200:
- Parse →
{catalog: <local>, subjects: [hcp-100307], modality: fmri, space: mni152, dtype: bold, qualifiers: [rest, denoised], coords: xyz=-42,38,12;t=0:1200}. - Normalize → terms already lowercase; coords kept; subject id already canonical.
- Resolve vocabulary → every
:termbinds; no holes. - Expand → a single concrete subject, so one candidate path.
- Match catalog →
:denoisedis not materialized; the catalog returns the raw:boldsource plus a denoising recipe. - Plan derivation → DAG:
raw bold → register to :mni152 → denoise. - Bind raw sources →
.raw = file:///mnt/xcit-h2/HCP-RawData/100307/.... - Apply coordinates → slice voxel
(-42, 38, 12)mm (mm because:space = :mni152) over volumes0:1200. - Return →
get()executes the DAG and returns the sliced time series.
Where queries diverge at stages 3–4:
- A wildcard query (
brain:///*/:fmri/:mni152/:bold/:rest/@*) leavessubjectsas*at stage 3, so stage 4 expands it into every matching subject in the catalog — many candidate paths instead of one. - An unresolved query (
brain:///*/!weirdmodality) leaves the!weirdmodalitysegment as an open hole at stage 3; stage 4 matches only catalog entries that carry that not-yet-mapped term, which is how unresolved terms stay queryable until ingestion maps them.
Interactive plan visualizer
The flowchart above is fixed. The query plan visualizer is live: enter any brain:// address (or pick an example) and watch it parse into segments, then derive the stage-6 graph by searching the transform registry (shown on the page) instead of dispatching a fixed per-modality recipe. Toggling the derivatives cache shows the stage-5 cache hit, where an existing derivative lets the planner skip the upstream steps.
Unresolved terms (!)
Unknown or not-yet-mapped terms are prefixed with !. The ! sigil replaces the earlier ~ (which collided with the shell/OS home-directory sigil, making paths awkward to type and copy) and ? (reserved as the URI query delimiter).
Query: all paths containing unknown terms
dataset.query("brain:///*/!*")Query: a specific unknown term across datasets
dataset.query("brain:///*/!weirdmodality")Query: fully resolved paths only
dataset.query("brain:///*/:*/:*/:*/@*")Validation
Scheme and transport: a canonical address must parse as a URI with scheme
brainorbrain+{transport}. Barebrain:///...denotes the default local catalog;brain+{transport}://{catalog}/...denotes a named catalog over that transport.Raw locators: a raw source must be a valid URI in its native scheme (
https:,s3:,file:). A bare path is normalized tofile:///....Canonical vocabulary (
:): segments prefixed with:must be members of (or resolvable to) the canonical vocabulary.Coordinates (
@): coordinate selectors must be valid with respect to:
:modality(voxels, surface, streams):space(units, bounds):dtype(temporal indexing)
Reserved characters: the path must not contain a literal
?or#(reserved as the URI query and fragment delimiters); unresolved terms use!.Use
@*when requesting the entire object/stream.
Summary
brain:///...is the canonical, location-independent address; an empty authority is the default local catalog, andbrain+{transport}://{catalog}/...resolves it from a named catalog. The catalog (where) and the subject’s dataset prefix (provenance) are independent.- Raw bytes keep their native URI (
https:,s3:,file:); the catalog maps canonical paths to raw sources, and.rawreturns the native URI. :marks canonical terms,!marks unresolved terms, and@provides coordinate-aware indexing with Media Fragments-style keys.- The query planner turns a
brain://address into raw sources plus a derivation plan through an explicit nine-stage pipeline; see Query planning.