Lab Projects
Our lab focuses on world modeling, embodied AI, efficient deep learning, and neuromorphic computing. In collaboration with EmbodyX, we open-source our models, benchmarks, and tools.
World Model & Embodied AI
Building the foundation for machines that understand and reason about the physical world — from cognitive architectures and physics-faithful video generation to resource-aware robotic reasoning. A core pillar at EmbodyX.
Models
Model PhyWorld
Physics-faithful world model for video generation. Uses a two-stage post-training approach — flow-matching fine-tuning for visual consistency, then Direct Preference Optimization to align generated dynamics with physical laws. Achieves 0.769 on VBench and 3.09 on PhyGround benchmark.
Benchmarks
Benchmark PhyGround
Criteria-grounded benchmark evaluating how well video generation models adhere to physical laws. 250 curated prompts across 13 physical laws spanning solid-body mechanics, fluid dynamics, and optics — with 37,000+ fine-grained labels from 459 annotators. Includes PhyJudge-9B, an open-source physics-specialized vision-language model for automated evaluation.
🍎 Gravity
🌊 Flow Dynamics
Papers
Paper The Reasoning Scaling Law
While visual quality plateaus early, reasoning capabilities exhibit a distinct "emergence" phase — increasing reasoning-specific training data by 1000x enables genuine generalization. Introduces VBVR-Bench.
Framework Human Cognition in Machines
A unified framework for World Models grounded in Cognitive Architecture Theory. Audits SOTAs across video, embodied, and epistemic domains to bridge machine and human-like cognition.
Paper RARRL: Robots Think Before They Act
CMU × EmbodyX. Resource-Aware Reasoning via RL enables robots to dynamically decide when to reason vs. act, achieving 60%+ reduction in reasoning time on ALFRED benchmarks.
Moxin LM
A family of open-source foundation models spanning language, vision-language, and vision-language-action for efficient and accessible AI.
Quantization
High-performance quantized models for efficient local inference on consumer hardware.
GGUF Moxin-GGUF
High-performance GGUF quantized models for efficient local inference, including DeepSeek, Qwen3, GLM, and more.