Skip to main content

Environment & Scene Graph

This section is normative.

The MIND Environment Model defines a lightweight, semantic, extensible scene graph used to describe environments, objects, coordinate frames, and spatial anchors relevant to multimodal human data, robotics, XR, CV, and embodied AI.


Environment Overview

A MIND Environment is a structured, hierarchical description of the physical or virtual space in which data is captured or simulated.

An environment MAY be:

  • static (a laboratory, XR calibration space, robot workspace),
  • dynamic (objects move or appear),
  • externally referenced (e.g., stored in glTF/USD),
  • embedded in the Container.

Environment Representation as a Scene Graph

An environment MUST be represented as a scene graph containing nodes.

Each node MUST be one of:

  • FrameNode — transform-only; defines a coordinate frame.
  • ObjectNode — an entity with geometry, semantics, and transforms.
  • RegionNode — a volume, plane, or spatial boundary.
  • AnchorNode — a persistent spatial reference (XR anchor, SLAM map point).

Nodes MUST form a Directed Acyclic Graph (DAG) with a single root.


Required Root Node

All environments MUST define one root coordinate frame:

world_root

All nodes MUST be descendants of world_root.

Transforms MUST be expressed in canonical space (right-handed, Y-up, meters).


Node Structure

All nodes MUST contain:

  • node_id
  • node_type (FrameNode|ObjectNode|RegionNode|AnchorNode)
  • optional parent_id (null for root)
  • a transform (position, rotation, optional scale)
  • optional metadata references

Additional required fields depend on type:

FrameNode

  • transform only

ObjectNode

  • geometry reference (inline or external)
  • optional semantic attributes (category, tags)
  • optional physical properties (mass, dimensions)

RegionNode

  • region type (box, sphere, plane, volume)
  • region parameters

AnchorNode

  • stable spatial reference
  • optional tracking_source

Geometry Representation

Geometry MAY be:

Inline

A minimal JSON form:

"geometry": {
"primitive": "box",
"size": [x,y,z]
}

External

Referenced via URI:

"geometry": {
"uri": "env/models/cup.glb",
"media_type": "model/gltf-binary"
}

Implementations MUST support both.


Semantics and Affordances

ObjectNodes MAY define:

  • category (e.g., "cup", "chair", "table")
  • tags (e.g., "graspable", "surface")
  • affordances:
    • can_grasp: boolean
    • can_place_on: boolean
    • interaction_points: list of transforms

Semantic attributes MUST NOT change interpretation of geometry but MAY guide learning systems.


Dynamic Transforms

Nodes MAY be dynamic.

A node is dynamic if:

  • transform is provided via a stream,
  • or transform includes dynamic: true.

Dynamic transforms MUST reference a modality (e.g., ObjectPose).

Static nodes MUST NOT change during the recording.


Linking Nodes to Streams and Events

Environment nodes MAY reference:

  • streams (e.g., segmentation, object tracking),
  • samples (specific timestamps),
  • events (manipulation, interaction).

Example:

"linked_streams": ["object_pose_cup1"]
"linked_events": ["evt_grasp_12"]

References MUST resolve strictly.


Hierarchical Metadata Inheritance

Nodes MAY reference metadata.

A child node MUST inherit metadata properties from ancestors unless overridden.

Example:

  • RobotModelProfile applied at robot root applies to all joints/links.

Environment in Container vs External

An environment MAY be:

Embedded

environment: {
"nodes": [...]
}

External

Referenced in metadata:

"MIND.environment/LabScene@1.0.0"

Implementations MUST support both.


Summary

The environment model:

  • defines nodes (Frame, Object, Region, Anchor),
  • supports transforms (static/dynamic),
  • supports inline/external geometry,
  • supports semantics and affordances,
  • supports metadata inheritance,
  • links environment to streams/events,
  • supports embedded or external environments.

This enables consistent spatial reasoning across XR, robotics, CV, biosensing, and embodied AI workflows.