Omni Path Specification

Universal address for multimodal brain data

This document specifies the structure, semantics, and usage of the omni path.

What is an omni path?

Omni path is a canonical address to access multimodal brain data. The goal is to have an addressing schema that enables consistent and dynamic referencing across diverse datasets and modalities, and ultimately support a data lakehouse of brain data.

An omni path starts with a namespace prefix indicating whether the path is raw (native data) or omni (derived data). The raw namespace preserves the original dataset structure, while the omni namespace provides a standardized schema for cross-dataset access and dynamic querying.

For example:

  • Raw (native): /raw/{dataset}/{**original_path}
  • Omni (derived): /omni/{subject}/:modality/:space/:dtype/{:qualifiers}/@coords

Raw namespace

Raw namespace is organized by dataset and follows the structure:

/raw/{dataset}/{**original_path}

where:

  • {dataset}: dataset identifier (e.g., hcp, abide, openneuro-ds00001)
  • {**original_path}: dataset-specific path defined by the dataset provider (e.g., files and folders as in BIDS)

Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.

Omni namespace

Omni paths are organized by universal subject identifiers and follows the structure:

/omni/{subject}/:modality/:space/:dtype/{:qualifiers}/@coords

where:

  • {subject}: universal subject identifier. See Subject identifiers.
  • :modality: modality term
  • :space: reference/registration space term
  • :dtype: data type
  • {:qualifiers}: optional list of qualifiers
  • @coords: coordinate selector (use @* to request all)
Symbol Meaning Notes
: Canonical vocabulary term Segment must be valid in the canonical vocabulary (or resolvable to it).
? Unresolved term Placeholder for unknowns; queryable and later resolvable during ingestion.
@ Indexing Explicit coordinate selector for spatial/time/stream access. See Coordinates.

Path segments

Segment Required Type Description
{subject} string Canonical subject ID (e.g., hcp-100307)
:modality vocab Modality (e.g., :fmri, :t1w, :eeg)
:space vocab Space (e.g., :mni152, :native)
:dtype vocab Representation (e.g., :bold, :intensity, :voltage)
{:qualifiers} optional vocab list Additional canonical qualifiers (task, processing, etc.)
@coords selector Coordinate/stream selector (@* allowed)

Subject identifiers

Omni uses a deterministic canonical subject ID to avoid collisions across datasets:

"{dataset_prefix}-{clean_id}"
  • dataset_prefix: canonical dataset code (e.g., hcp)
  • clean_id: dataset subject identifier normalized into a stable form

Example:

hcp-100307

Qualifiers

Qualifiers are optional additional segments appended after :dtype:

/omni/{subject}/:modality/:space/:dtype/:qual1/:qual2/.../@coords

Typical qualifier families:

  • acquisition/condition: :rest, :task, :eyes-open, :eyes-closed
  • processing: :denoised, :filtered, :source-localized
  • feature forms: :parcellated, :roi-mean, :embedding

Coordinates

Coordinates represent spatial, temporal, and stream indexing. They are expressed using @...:

Form Meaning
@* entire data
@x,y,z spatial point. Interpretation of the xyz is defined by :space (e.g., :MNI152 implies standard space coordinates).
@x,y,z/t spatial point + timepoint. t indicates temporal indexing (e.g., fMRI volume index, EEG timepoint).
@x,y,z0:z1/t0:t1 spatial bounding box + time range
@Cz Named stream selector (i.e., channel or variable). They are modality-dependent and should map to named axes (channels, variables, parcels).

Raw vs Omni

Aspect Raw Omni (derived)
Prefix /raw/ /omni/ (or alias /derived/)
Structure dataset-defined fixed schema
Symbols none required :, ?, @ required
Subject ID dataset convention deterministic universal ID
Coordinates none (native implied) explicit @... selector
Vocabulary none enforced vocabulary

Examples

Canonical derived paths:

/derived/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
/derived/hcp-100307/:fmri/:MNI152/:bold/:rest/:denoised/@32,45,12/0:1200
/derived/hcp-100307/:t1w/:MNI152/:intensity/@*
/derived/hcp-100307/:eeg/:MNI152/:voltage/:rest/:source-localized/@*
/derived/hcp-100307/:multimodal/:MNI152/:embedding/:rest/@*

Corresponding raw paths:

/raw/hcp/100307/**

Query patterns (API examples)

Assume an API with:

  • dataset.query(pattern: str) -> list[path]
  • dataset.get(path: str) -> object
  • object.raw

Raw: what did the dataset provide for a subject?

dataset.query("/raw/hcp/100307/**")

Omni: all resting fMRI in standard space

dataset.query("/derived/*/:fmri/:MNI152/:bold/:rest/@*")

Omni: specific voxel time series

dataset.get("/omni/hcp-100307/:fmri/:MNI152/:bold/:rest/@-42,38,12/0:1200")

Trace back provenance to raw inputs

dataset.get("/omni/hcp-100307/:fmri/:MNI152/:bold/:rest/@*").raw

Unresolved terms (?)

Unknown or not-yet-mapped terms are prefixed with ?.

Query: all paths containing unknown terms

dataset.query("/*/*/?*")

Query: a specific unknown term across datasets

dataset.query("/*/?weirdmodality")

Query: fully resolved paths only

dataset.query("/omni/*/:*/:*/:*/@*")

Validation

  1. Canonical vocabulary (:): segments prefixed with : must be members of (or resolvable to) the canonical vocabulary.

  2. Coordinates (@): coordinate selectors must be valid with respect to:

  • :modality (voxels, surface, streams)
  • :space (units, bounds)
  • :dtype (temporal indexing)
  1. Use @* when requesting the entire object/stream.

Summary

  • /raw/... preserves dataset-native layout and provenance.
  • /omni/... (or /derived/...) provides a fixed, enforced schema for cross-dataset querying.
  • : marks canonical terms, ? marks unresolved terms, and @ provides coordinate-aware indexing.