Environment & Scene Graph

This section is normative.

The MIND Environment Model defines a lightweight, semantic, extensible scene graph used to describe environments, objects, coordinate frames, and spatial anchors relevant to multimodal human data, robotics, XR, CV, and embodied AI.

Environment Overview

A MIND Environment is a structured, hierarchical description of the physical or virtual space in which data is captured or simulated.

An environment MAY be:

static (a laboratory, XR calibration space, robot workspace),
dynamic (objects move or appear),
externally referenced (e.g., stored in glTF/USD),
embedded in the Container.

Environment Representation as a Scene Graph

An environment MUST be represented as a scene graph containing nodes.

Each node MUST be one of:

FrameNode — transform-only; defines a coordinate frame.
ObjectNode — an entity with geometry, semantics, and transforms.
RegionNode — a volume, plane, or spatial boundary.
AnchorNode — a persistent spatial reference (XR anchor, SLAM map point).

Nodes MUST form a Directed Acyclic Graph (DAG) with a single root.

Required Root Node

All environments MUST define one root coordinate frame:

world_root

All nodes MUST be descendants of world_root.

Transforms MUST be expressed in canonical space (right-handed, Y-up, meters).

Node Structure

All nodes MUST contain:

node_id
node_type (FrameNode|ObjectNode|RegionNode|AnchorNode)
optional parent_id (null for root)
a transform (position, rotation, optional scale)
optional metadata references

Additional required fields depend on type:

FrameNode

transform only

ObjectNode

geometry reference (inline or external)
optional semantic attributes (category, tags)
optional physical properties (mass, dimensions)

RegionNode

region type (box, sphere, plane, volume)
region parameters

AnchorNode

stable spatial reference
optional tracking_source

Geometry Representation

Geometry MAY be:

Inline

A minimal JSON form:

"geometry": {
  "primitive": "box",
  "size": [x,y,z]
}

External

Referenced via URI:

"geometry": {
  "uri": "env/models/cup.glb",
  "media_type": "model/gltf-binary"
}

Implementations MUST support both.

Semantics and Affordances

ObjectNodes MAY define:

category (e.g., "cup", "chair", "table")
tags (e.g., "graspable", "surface")
affordances:
- can_grasp: boolean
- can_place_on: boolean
- interaction_points: list of transforms

Semantic attributes MUST NOT change interpretation of geometry but MAY guide learning systems.

Dynamic Transforms

Nodes MAY be dynamic.

A node is dynamic if:

transform is provided via a stream,
or transform includes dynamic: true.

Dynamic transforms MUST reference a modality (e.g., ObjectPose).

Static nodes MUST NOT change during the recording.

Linking Nodes to Streams and Events

Environment nodes MAY reference:

streams (e.g., segmentation, object tracking),
samples (specific timestamps),
events (manipulation, interaction).

Example:

"linked_streams": ["object_pose_cup1"]
"linked_events": ["evt_grasp_12"]

References MUST resolve strictly.

Hierarchical Metadata Inheritance

Nodes MAY reference metadata.

A child node MUST inherit metadata properties from ancestors unless overridden.

Example:

RobotModelProfile applied at robot root applies to all joints/links.

Environment in Container vs External

An environment MAY be:

Embedded

environment: {
  "nodes": [...]
}

External

Referenced in metadata:

"MIND.environment/LabScene@1.0.0"

Implementations MUST support both.

Summary

The environment model:

defines nodes (Frame, Object, Region, Anchor),
supports transforms (static/dynamic),
supports inline/external geometry,
supports semantics and affordances,
supports metadata inheritance,
links environment to streams/events,
supports embedded or external environments.

This enables consistent spatial reasoning across XR, robotics, CV, biosensing, and embodied AI workflows.

Environment Overview​

Environment Representation as a Scene Graph​

Required Root Node​

Node Structure​

FrameNode​

ObjectNode​

RegionNode​

AnchorNode​

Geometry Representation​

Inline​

External​

Semantics and Affordances​

Dynamic Transforms​

Linking Nodes to Streams and Events​

Hierarchical Metadata Inheritance​

Environment in Container vs External​

Embedded​

External​

Summary​

Environment Overview

Environment Representation as a Scene Graph

Required Root Node

Node Structure

FrameNode

ObjectNode

RegionNode

AnchorNode

Geometry Representation

Inline

External

Semantics and Affordances

Dynamic Transforms

Linking Nodes to Streams and Events

Hierarchical Metadata Inheritance

Environment in Container vs External

Embedded

External

Summary