robust Dexterity with diversity

OOJU is building the data foundation to solve robotic generalization and ultimately interaction generation. We start by digitizing real-world dexterity in mixed reality today to train the generative models of tomorrow.

We are building a pipeline for scalable data collection

Our Mission

Generalization requires diversity at scale.

We are building the data infrastructure to bridge the gap between rigid demos and real-world complexity. We provide the critical mass of high-fidelity, multimodal data required to train generalized actions capable of robust interaction in both simulation and the real world.

Our Approach

OOJU understands Worlds & captures actions.

We combine human hand motions and intent with semantic understanding using portable XR devices.

By moving beyond the lab, we collect diverse, real-world interactions, providing multimodal, dense, and context-rich data with auto-labeling. We achieve higher sample efficiency with minimal demonstrations and abundance in diversity. Ultimately, our pipeline enables the scalability required to train true 'generalist' agents.

Use Cases

  • Foundation models are starved for diversity. XR-based human data offers the potential for a scalable path to building massive, robot-agnostic datasets, enabling the training of general agents without relying solely on slow physical collections.

  • An egocentric view goes beyond simple object and environment detection. It allows modeling precise contact points, as well as understanding trajectories and object affordances, which ultimately help understand exactly how tools and objects are used.

  • High-quality human motions can serve as 'seeds' for generative models. This approach has the potential to synthesize an infinite number of interaction variations, expanding limited real-world demos into massive training sets.

FAQs

Why XR?

Teleoperation is precise, but tied to one specific robot and lab. OOJU is hardware-agnostic and portable. Our data captures fundamental dextrous manipulations. We prioritize scalability by leveraging portable XR to capture diverse interactions in semi-controlled and open-world environments without complex setups.

Why not just train on video datasets?

Video offers massive scale (e.g., Youtube), but is passive, flat, and 2D. OOJU captures 3D reality. We provide depth info, human intent, and hand movements. This helps robots understand spatial/semantic context and the physical skills to properly manipulate the world, and not just repeat learned pixel patterns.

Download Sample Data

We will email the Hugging Face link to your inbox.