Lab Projects

Our lab focuses on world modeling, embodied AI, efficient deep learning, and neuromorphic computing. In collaboration with EmbodyX, we open-source our models, benchmarks, and tools.

Research Initiative

World Model & Embodied AI

Building the foundation for machines that understand and reason about the physical world — from cognitive architectures and physics-faithful video generation to resource-aware robotic reasoning. A core pillar at EmbodyX.

Models

PhyWorld (Ours)
Cosmos
LTX
OmniWeave

Model PhyWorld

Physics-faithful world model for video generation. Uses a two-stage post-training approach — flow-matching fine-tuning for visual consistency, then Direct Preference Optimization to align generated dynamics with physical laws. Achieves 0.769 on VBench and 3.09 on PhyGround benchmark.

Benchmarks

Benchmark PhyGround

Criteria-grounded benchmark evaluating how well video generation models adhere to physical laws. 250 curated prompts across 13 physical laws spanning solid-body mechanics, fluid dynamics, and optics — with 37,000+ fine-grained labels from 459 annotators. Includes PhyJudge-9B, an open-source physics-specialized vision-language model for automated evaluation.

🍎 Gravity

LTX-2.3-22B
Cosmos-14B

🌊 Flow Dynamics

Cosmos-14B
Wan2.2-27B-A14B

Papers

Paper The Reasoning Scaling Law

While visual quality plateaus early, reasoning capabilities exhibit a distinct "emergence" phase — increasing reasoning-specific training data by 1000x enables genuine generalization. Introduces VBVR-Bench.

Framework Human Cognition in Machines

A unified framework for World Models grounded in Cognitive Architecture Theory. Audits SOTAs across video, embodied, and epistemic domains to bridge machine and human-like cognition.

Paper RARRL: Robots Think Before They Act

CMU × EmbodyX. Resource-Aware Reasoning via RL enables robots to dynamically decide when to reason vs. act, achieving 60%+ reduction in reasoning time on ALFRED benchmarks.

Open Source Models

Moxin LM

A family of open-source foundation models spanning language, vision-language, and vision-language-action for efficient and accessible AI.

LLM Moxin-LLM

Open-source Large Language Model framework optimized for efficiency and performance.

VLM Moxin-VLM

Vision-Language Model bridging visual understanding with linguistic reasoning.

VLA Moxin-VLA

Vision-Language-Action models for embodied AI and robotics control tasks.

Efficiency Tools

Quantization

High-performance quantized models for efficient local inference on consumer hardware.

GGUF Moxin-GGUF

High-performance GGUF quantized models for efficient local inference, including DeepSeek, Qwen3, GLM, and more.