Omni Path Specification

Universal address for multimodal brain data

This document specifies the structure, semantics, and usage of the omni path.

What is an omni path?

Omni path is a canonical address to access brain data. The goal is to have an addressing schema that enables consistent and dynamic referencing across diverse modalities and datasets, and ultimately support implementinh a data lakehouse for human brain data.

An omni path is a string that starts with a namespace prefix indicating whether the path is raw (original data) or omni (derived, processed data). The raw namespace preserves the original dataset structure, while the omni namespace provides a standardized schema for cross-dataset access and dynamic querying.

For example:

  • Raw (native): /raw/ds12/sub-102/{**original_path}
  • Omni (derived): /omni/ds12-102/:eeg/:native/:voltage/:rest/Cz/@*

Raw namespace is organized by dataset and follows the structure:

/raw/{dataset}/{**original_path}

where:

  • {dataset}: dataset identifier (e.g., hcp, abide, openneuro-ds00001)
  • {**original_path}: dataset-specific path defined by the dataset provider (e.g., BIDS structure, HCP structure, etc.)

Raw paths intentionally preserve the data provider layout. They do not enforce a vocabulary or specific structure.

Omni namespace

Omni namespace is organized by universal subject identifiers and follows the structure:

/omni/{subjects}/:modality/:space/:dtype/{:qualifiers}/@coords

where:

  • {subjects}: universal subject identifier (may include dataset identifier). See Subject identifiers. This can be one or more subjects (e.g., hcp-100307, hcp-100307,hcp-100408) separated by commas.

  • the segments following {subject} are controlled vocabulary terms that describe the data. The are separated by / and must be prefixed with : to indicate they are from the canonical vocabulary. The required segments are:

    • :modality: modality term, e.g., :fmri, :eeg
    • :space: reference/registration space term, e.g., :mni152, :native
    • :dtype: data type, e.g., :bold, :voltage

And the optional segments are:

  • {:qualifiers}: optional list of qualifiers, e.g., :denoised, :rest, :task
  • @coords: coordinate selector (use @* to request all), e.g., @32,45,12/0:1200 for a voxel time series or @Cz for an EEG channel

As seen in the structure, there are three special symbols used in the omni namespace to indicate specific semantics:

Symbol Meaning Notes
: Controlled vocabulary term Segment must be valid in the canonical vocabulary (or resolvable to it).
? Unresolved term Placeholder for unknowns; queryable and later resolvable during ingestion.
@ Indexing Explicit selector for spatial/time/stream coordinates. See Coordinates.

Path segments

Segment Required Type Description
{subject} string Canonical subject ID (e.g., hcp-100307)
:modality vocab Modality (e.g., :fmri, :t1w, :eeg)
:space vocab Space (e.g., :mni152, :native)
:dtype vocab Representation (e.g., :bold, :intensity, :voltage)
{:qualifiers} optional vocab list Additional canonical qualifiers (task, processing, etc.)
@coords optional selector Coordinate/stream selector (defaults to @*)

Subject identifiers

A deterministic canonical subject id is preferred to enable consistent referencing across datasets:

"{dataset_prefix}-{clean_id}"
  • {dataset_prefix}: canonical dataset code (e.g., hcp)
  • {clean_id}: dataset subject identifier normalized into a stable form

Example: hcp-100307

Qualifiers

Qualifiers are optional additional segments that provide more specific information about the data.

/omni/{subject}/:modality/:space/:dtype/:qual1/:qual2/.../@coords

Typical qualifier families:

  • acquisition/condition: :rest, :task, :eyes-open, :eyes-closed
  • processing: :denoised, :filtered, :source-localized
  • feature forms: :parcellated, :roi-mean, :embedding

Coordinates

Coordinates represent spatial, temporal, and stream indexing. They are expressed using @... syntax at the end of the path. The interpretation of the coordinates depends on the modality, space, and dtype.

Form Meaning
@* entire data
@x,y,z spatial point. Interpretation of the xyz is defined by :space (e.g., :MNI152 implies standard space coordinates).
@x,y,z/t spatial point + timepoint. t indicates temporal indexing (e.g., fMRI volume index, EEG timepoint).
@x,y,z0:z1/t0:t1 spatial bounding box + time range
@Cz Named stream selector (i.e., channel or variable). They are modality-dependent and should map to named axes (channels, variables, parcels).

Raw vs Omni

Aspect Raw Omni (derived)
Prefix /raw/ /omni/ (or alias /derived/)
Structure dataset-defined fixed schema
Symbols none required :, ?, @ required
Subject ID dataset convention deterministic universal ID
Coordinates none (native implied) explicit @... selector
Vocabulary none enforced vocabulary

Examples

Canonical derived paths:

/derived/hcp-100307/:fmri/:MNI152/:bold/:rest/@*
/derived/hcp-100307/:fmri/:MNI152/:bold/:rest/:denoised/@32,45,12/0:1200
/derived/hcp-100307/:t1w/:MNI152/:intensity/@*
/derived/hcp-100307/:eeg/:MNI152/:voltage/:rest/:source-localized/@*
/derived/hcp-100307/:multimodal/:MNI152/:embedding/:rest/@*

Corresponding raw paths:

/raw/hcp/100307/**

Query patterns (API examples)

Assume an API with:

  • dataset.query(pattern: str) -> list[path]
  • dataset.get(path: str) -> object
  • object.raw

Raw: what did the dataset provide for a subject?

dataset.query("/raw/hcp/100307/**")

Omni: all resting fMRI in standard space

dataset.query("/derived/*/:fmri/:MNI152/:bold/:rest/@*")

Omni: specific voxel time series

dataset.get("/omni/hcp-100307/:fmri/:MNI152/:bold/:rest/@-42,38,12/0:1200")

Trace back provenance to raw inputs

dataset.get("/omni/hcp-100307/:fmri/:MNI152/:bold/:rest/@*").raw

Unresolved terms (?)

Unknown or not-yet-mapped terms are prefixed with ?.

Query: all paths containing unknown terms

dataset.query("/*/*/?*")

Query: a specific unknown term across datasets

dataset.query("/*/?weirdmodality")

Query: fully resolved paths only

dataset.query("/omni/*/:*/:*/:*/@*")

Validation

  1. Canonical vocabulary (:): segments prefixed with : must be members of (or resolvable to) the canonical vocabulary.

  2. Coordinates (@): coordinate selectors must be valid with respect to:

  • :modality (voxels, surface, streams)
  • :space (units, bounds)
  • :dtype (temporal indexing)
  1. Use @* when requesting the entire object/stream.

Summary

  • /raw/... preserves dataset-native layout and provenance.
  • /omni/... (or /derived/...) provides a fixed, enforced schema for cross-dataset querying.
  • : marks canonical terms, ? marks unresolved terms, and @ provides coordinate-aware indexing.