Zehua Wang

Sophomore @ MIT

I think about AI the way a physicist would.

About

I'm Zehua Wang (王泽华), an undergraduate at MIT pursuing a B.S. in Physics and Artificial Intelligence & Decision Making (GPA: 5.0/5.0). My work sits at the intersection of physics and AI: from reinforcement learning for humanoid control, to diffusion models and molecular dynamics.

I currently work with Prof. Tommi Jaakkola on diffusion models and molecular dynamics. Before that, I was at the FortyFive AI Lab (Dr. Ge Yang), working on RL and imitation learning for humanoid whole-body control and scaling laws for embodied AI.

Previously, I worked with Prof. Huazhe Xu at Tsinghua University on learning-based robot manipulation and sim-to-real transfer. Before MIT, I completed my freshman year at Tsinghua's Institute for Interdisciplinary Information Sciences (IIIS, Yao Class). I was the 1st place winner in the theoretical round at the 54th International Physics Olympiad (IPhO 2024) and earned 3rd place at the Romanian Master of Physics 2023.

Research Interests

Reinforcement Learning Humanoid Whole-Body Control Sim-to-Real Transfer Imitation Learning Scaling Laws for Embodied AI Generative Models Diffusion Models

Beyond Research

Outside of research, I'm into basketball (led the IIIS team as a freshman), skiing (10+ years), and lifting. I also enjoy soccer, volleyball, table tennis, swimming, and hiking. In my spare time, I listen to music and occasionally sing.

I'm always happy to chat, whether it's about research, physics, AI, or anything else. If something here caught your interest, feel free to reach out.

Featured Projects

Selected research projects and open-source work

2026 MIT 6.8300 Lead

Video Real2Sim (VR2S)

Video Real2Sim explores a single-image route to scene reconstruction for populated real-world scenes. Given one photo, the pipeline uses a general video model to synthesize missing orbital observations, then feeds those views into a pose-free reconstruction stack. The goal is not to claim that video replaces geometry. The project asks a narrower question: when the observed image does not expose the back side, contact regions, or other useful scene evidence, can generated video provide enough additional observations to improve reconstruction? In the final report, the strongest gains appear in perceptual and back-side reconstruction quality, with metric-level caveats and small exceptions surfaced explicitly. This was my final project for MIT 6.8300 Advances in Computer Vision.