Environment & Scene Graph
This section is normative.
The MIND Environment Model defines a lightweight, semantic, extensible scene graph used to describe environments, objects, coordinate frames, and spatial anchors relevant to multimodal human data, robotics, XR, CV, and embodied AI.
Environment Overview
A MIND Environment is a structured, hierarchical description of the physical or virtual space in which data is captured or simulated.
An environment MAY be:
- static (a laboratory, XR calibration space, robot workspace),
- dynamic (objects move or appear),
- externally referenced (e.g., stored in glTF/USD),
- embedded in the Container.
Environment Representation as a Scene Graph
An environment MUST be represented as a scene graph containing nodes.
Each node MUST be one of:
- FrameNode — transform-only; defines a coordinate frame.
- ObjectNode — an entity with geometry, semantics, and transforms.
- RegionNode — a volume, plane, or spatial boundary.
- AnchorNode — a persistent spatial reference (XR anchor, SLAM map point).
Nodes MUST form a Directed Acyclic Graph (DAG) with a single root.
Required Root Node
All environments MUST define one root coordinate frame:
world_root
All nodes MUST be descendants of world_root.
Transforms MUST be expressed in canonical space (right-handed, Y-up, meters).
Node Structure
All nodes MUST contain:
node_idnode_type(FrameNode|ObjectNode|RegionNode|AnchorNode)- optional
parent_id(null for root) - a transform (
position,rotation, optionalscale) - optional metadata references
Additional required fields depend on type:
FrameNode
- transform only
ObjectNode
- geometry reference (inline or external)
- optional semantic attributes (
category,tags) - optional physical properties (
mass,dimensions)
RegionNode
- region type (
box,sphere,plane,volume) - region parameters
AnchorNode
- stable spatial reference
- optional
tracking_source
Geometry Representation
Geometry MAY be:
Inline
A minimal JSON form:
"geometry": {
"primitive": "box",
"size": [x,y,z]
}
External
Referenced via URI:
"geometry": {
"uri": "env/models/cup.glb",
"media_type": "model/gltf-binary"
}
Implementations MUST support both.
Semantics and Affordances
ObjectNodes MAY define:
category(e.g., "cup", "chair", "table")tags(e.g., "graspable", "surface")affordances:can_grasp: booleancan_place_on: booleaninteraction_points: list of transforms
Semantic attributes MUST NOT change interpretation of geometry but MAY guide learning systems.
Dynamic Transforms
Nodes MAY be dynamic.
A node is dynamic if:
- transform is provided via a stream,
- or transform includes
dynamic: true.
Dynamic transforms MUST reference a modality (e.g., ObjectPose).
Static nodes MUST NOT change during the recording.
Linking Nodes to Streams and Events
Environment nodes MAY reference:
- streams (e.g., segmentation, object tracking),
- samples (specific timestamps),
- events (manipulation, interaction).
Example:
"linked_streams": ["object_pose_cup1"]
"linked_events": ["evt_grasp_12"]
References MUST resolve strictly.
Hierarchical Metadata Inheritance
Nodes MAY reference metadata.
A child node MUST inherit metadata properties from ancestors unless overridden.
Example:
RobotModelProfileapplied at robot root applies to all joints/links.
Environment in Container vs External
An environment MAY be:
Embedded
environment: {
"nodes": [...]
}
External
Referenced in metadata:
"MIND.environment/LabScene@1.0.0"
Implementations MUST support both.
Summary
The environment model:
- defines nodes (Frame, Object, Region, Anchor),
- supports transforms (static/dynamic),
- supports inline/external geometry,
- supports semantics and affordances,
- supports metadata inheritance,
- links environment to streams/events,
- supports embedded or external environments.
This enables consistent spatial reasoning across XR, robotics, CV, biosensing, and embodied AI workflows.