Environment & Scene Graph
This section explains how environments work in MIND.
Why an Environment Model?
Human data, XR data, biosensing, robotics, and AI training all depend on context:
- Where is the person?
- Where is the robot?
- What objects are in the scene?
- What does the environment look like?
Environments give meaning to spatial data.
Scene Graph Basics
MIND uses a lightweight scene graph, simpler than USD but more expressive than glTF.
ASCII example:
world_root
├── room_frame
│ ├── table
│ │ └── cup
│ └── chair
└── robot_base
└── robot_arm
Node Types
FrameNode
Just a transform: defines a coordinate frame.
ObjectNode
Represents real-world or virtual objects.
- geometry
- category
- affordances
RegionNode
Areas of interest:
- floor plane
- table surface
- no-go region
- pickup zone
AnchorNode
Stable references (XR anchors, SLAM landmarks).
Geometry Handling
You can embed geometry directly:
"geometry": { "primitive": "box", "size": [1,1,1] }
Or reference external files:
"uri": "models/cup.glb"
Semantics & Affordances
ObjectNodes can tell agents how they might be used:
"category": "cup",
"affordances": { "can_grasp": true, "can_contain_liquid": true }
This is essential for embodied AI.
Dynamic Nodes
Objects can move.
Example:
- tracked cups
- robot arms
- humans interacting with objects
Dynamic transforms come from streams such as:
- ObjectPose
- MarkerSets
- CV tracking
Linking Streams and Events
Nodes can reference:
- Pose streams → how the object moves
- Events → interactions involving the object
Example:
linked_streams: ["cup1_pose"]
linked_events: ["evt_grasp_23"]
Embedded vs External Environment
Small scenes → embed in the Container.
Large scenes → reference external environment files.
Flexible for many workflows.
Summary
MIND’s environment system:
- describes physical & virtual worlds,
- connects objects to events and streams,
- supports geometry + semantics + movement,
- is lightweight but expressive,
- works across XR, robotics, CV, biosensing, embodied AI.