One Policy, Any Body: How COMPASS Is Making Robot Navigation Truly Portable
NVIDIA's COMPASS uses residual reinforcement learning and policy distillation to deploy a single navigation policy across wheeled and legged platforms — with zero-shot sim-to-real transfer on the Unitree G1 humanoid.
The central bottleneck in robot navigation has always been the same: every new platform needs its own stack. You tune a controller for a wheeled base, and the moment you switch to a legged humanoid, you start over. Different kinematics, different sensors, different dynamics — and months of re-engineering.
COMPASS, a new framework from NVIDIA Research accepted at the International Conference on Robotics and Automation (ICRA) 2026, attacks this problem directly. It trains a mobility policy on a single embodiment and then transfers it to diverse robots through residual reinforcement learning and policy distillation. The result is a generalist policy that works across platforms — and it transfers zero-shot from simulation to the real world, including on the Unitree G1 humanoid.
The Problem: Navigation Is Locked to Hardware
Classical navigation stacks are effective but brittle. They rely on hand-tuned parameters for specific sensor configurations, wheel bases, and motion models. Learning-based alternatives reduce some of this burden, but imitation learning demands high-quality demonstrations for every new robot. If your lab owns a wheeled rover and a bipedal humanoid, you collect two separate datasets and train two separate policies.
This is the embodiment lock-in problem, and it is especially painful for humanoid robotics. A navigation policy trained on a wheeled platform has no notion of balance, gait, or terrain adaptation. The gap between platforms is so wide that cross-embodiment transfer has historically been considered impractical for real-world deployment.
The Insight: Residual Corrections, Not Full Retraining
COMPASS (Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis) approaches transfer differently. Instead of training a new policy from scratch for each robot, it starts with a base mobility policy trained via imitation learning on a single embodiment, then learns small residual corrections for each new platform in randomized simulation.
The workflow has three stages:
1. Imitation Learning Foundation. A world model and mobility policy are pretrained using existing teacher policies on one robot. This creates a general understanding of navigation — obstacle avoidance, goal-seeking, path planning — that is not tied to specific actuator dynamics.
2. Residual RL for Embodiment Adaptation. For each new robot, COMPASS freezes the base policy and trains a lightweight residual network using reinforcement learning. This network learns only the corrective actions needed to adapt the base behavior to the new platform’s physics and sensors. Because the residual is small, sample efficiency is high — the robot does not need to relearn navigation from scratch.
3. Policy Distillation into a Generalist. The embodiment-specific specialists are then distilled into a single policy conditioned on an embedding vector that encodes the robot’s physical properties. The generalist policy can be deployed on any robot in the training set by simply switching the embedding.
Across a diverse benchmark of robot platforms and environment configurations, the distilled generalist achieves a success rate roughly 5X higher than the pretrained imitation-learning baseline and 3X lower travel time on average.
Zero-Shot Sim-to-Real on a Humanoid
The most compelling result for humanoid robotics is the zero-shot sim-to-real transfer. COMPASS policies trained entirely in NVIDIA Isaac Lab simulation were deployed directly on physical robots with no additional real-world fine-tuning.
This includes the Unitree G1 humanoid. The video below shows the COMPASS policy controlling the G1 in the real world, navigating around obstacles and reaching target destinations using the same policy distilled from multiple simulated embodiments.
The COMPASS policy controlling the Unitree G1 humanoid in the real world — transferred zero-shot from simulation with no additional fine-tuning.
The significance here is not just that it works on a humanoid. It is that the same distilled policy handles both the G1 and wheeled platforms like Carter, suggesting that the generalist representation has abstracted navigation sufficiently to bridge the gap between legged and wheeled locomotion.
Integration with GR00T and Open-Vocabulary Navigation
COMPASS is designed to plug into larger systems. The project page demonstrates two notable integrations:
-
GR00T post-training. The COMPASS distillation datasets were used to add navigation capabilities to NVIDIA’s GR00T humanoid foundation model through post-training. This points toward a future where generalist humanoid policies absorb mobility skills from specialist frameworks rather than learning them from scratch.
-
Open-vocabulary object navigation. By combining COMPASS with Locate3D, the policy can navigate to objects specified by natural language — “go to the red chair” — without predefined semantic maps.
Open Source and Reproducible
COMPASS is fully open source under the MIT license:
- Training code on GitHub (PyTorch, Isaac Lab 2.1.0, Isaac Sim 4.5.0)
- Pretrained checkpoints and datasets for reproduction
- Simulation environments for wheeled and legged platforms
- Distillation datasets compatible with GR00T
This is not a paper-only release. The repository includes everything needed to train your own cross-embodiment policies or adapt the pretrained generalist to a new robot via residual RL.
Why This Matters for Humanoid Robotics
Humanoid robots are expensive and slow to deploy. Per-robot navigation stacks multiply that cost at every new platform. COMPASS points toward shared mobility infrastructure across the humanoid ecosystem: train the generalist once, adapt it to a specific G1, H1, or future platform with a lightweight residual fine-tuning run, and deploy zero-shot.
For labs and teams that cannot afford months of platform-specific engineering — or the rollout cost on expensive hardware — a generalist policy that arrives nearly ready to deploy is a meaningful shift in what’s practical. Combined with the open-source release, this lowers the floor for anyone building navigation across the humanoid ecosystem.
Key Takeaways
- COMPASS is a three-stage framework (IL → Residual RL → Distillation) that trains cross-embodiment navigation policies from demonstrations on a single robot.
- The distilled generalist achieves ~5X higher success rate and ~3X lower travel time compared to the imitation-learning baseline, across diverse platforms.
- Zero-shot sim-to-real transfer was demonstrated on both wheeled robots and the Unitree G1 humanoid.
- Open-source code, checkpoints, and GR00T-compatible datasets are available now.
📰 Sources: