Time and Timestamps
This section gives a human-friendly explanation of how time works in MIND, with a focus on real-world use cases:
- multi-device capture (mocap + video + XR),
- biosignals (EEG/EMG),
- training and deploying virtual and robotic agents.
Time in MIND — Big Picture
MIND is designed so that:
- You can plug in any capture device (mocap, camera, HMD, EEG, robot sensors),
- Record everything into one Container,
- And later align all streams on a single, clean timeline for:
- offline analysis,
- model training,
- real-time agent control.
To do this, every Sample and Event in MIND carries two key timestamps plus some optional helpers.
Two Clocks: t_monotonic and t_system
Think of it like this:
t_monotonic→ the device’s steady internal stopwatch
(never goes backwards, not affected by system clock changes)t_system→ the real-world clock on the wall
(can jump if the user changes the time, or NTP corrects it)
MIND requires both, because:
t_monotonicis perfect for ordering and interpolation,t_systemis perfect for:- aligning different devices,
- matching logs, databases, or external events.
ASCII diagram:
Device A (HMD) Device B (EEG)
t_monotonic_A t_monotonic_B
| |
v v
[steady] [steady]
\ /
\ /
+--- t_system --+ (shared wall-clock timeline)
Why Microseconds?
MIND expects microsecond resolution or better for timestamps.
Why?
- Motion capture, XR, and EEG can operate at hundreds or thousands of Hz.
- Microseconds:
- are precise enough for these use cases,
- are still convenient integers,
- map cleanly to most OS clocks and hardware timers.
If your device only gives milliseconds:
- You still store microseconds, you just multiply:
123 ms → 123000 µs.
If your device gives nanoseconds:
- You can either:
- divide by 1000 to get microseconds, or
- store extra precision in
additional_clocks.
Within a Stream: non-decreasing time
Within a single Stream:
- timestamps never go backward,
- they can stay the same (e.g., batch outputs or two sensors fused into one sample).
Example timeline:
Sample 0: t_monotonic = 1000000
Sample 1: t_monotonic = 1001666
Sample 2: t_monotonic = 1003333
Sample 3: t_monotonic = 1003333 (same time, different content)
This makes interpolation, resampling, and model training much simpler.
Across Streams: aligning everything
You might have:
hand_posefrom an XR device,full_body_posefrom a mocap system,video_framesfrom a camera,eeg_signalsfrom a biosensor.
MIND’s rule is:
All of these streams must be alignable onto a shared timeline.
hand_pose and full_body_pose might have slightly different t_monotonic behaviors, but their t_system values (and any sync metadata) let you bring them onto the same time axis.
ASCII sketch:
Global time (t_system, µs)
│
├─ mocap stream timestamps
├─ xr stream timestamps
├─ video stream timestamps
└─ eeg stream timestamps
MIND requires the Container to carry enough metadata (like offsets or sync description) so you can do this alignment in a robust way.
Derived timestamps
Sometimes, devices don’t give you perfect time:
- cameras without reliable clocks,
- legacy sensors,
- data imported from old logs.
MIND allows recorders or tools to compute or fix timestamps, but:
- they must track provenance:
- which timestamps are original,
- which are derived,
- who derived them.
This is important when training models or debugging a system: you want to know whether you’re looking at raw sensor timing or something that’s been “cleaned up.”
Video, frames, EEG, and other high-rate signals
For sources like video or EEG:
- It’s natural to think in frames or sample indices, not just time.
MIND supports this via:
frame→ for video frames,sample_index→ for high-rate streams (like EEG channels).
These work in addition to the timestamps, not instead of them.
Example:
{
"timestamp": {
"t_monotonic": 10003333,
"t_system": 1710000000000000,
"frame": 42
},
"image_ref": "frame_0042.png"
}
You can then:
- Seek by frame number in a video editor,
- Align EEG
sample_indexwith time windows in analysis tools, - Or cross-align pose and video frames for training vision-language-action models.
Events follow the same rules
Events (like grasp start/end, manipulation, button presses) use the same timestamp object as Samples.
This makes it easy to:
- determine which pose and contact Samples were “active” at the moment of an Event,
- correlate user behavior with sensor readings,
- train models that predict events from raw streams.
What this gives you
By following these rules, you get:
- precise temporal alignment across:
- XR devices,
- mocap systems,
- cameras,
- biosensors,
- agent outputs,
- data that works both:
- offline (training, analysis),
- online (real-time agents and control loops),
- a clean foundation for retargeting and cross-device fusion.
Time is now something you can trust across your entire MIND ecosystem.