I joined NVIDIA Research in 2025 as a postdoctoral researcher, working at the intersection of
Computer Vision, Graphics, and AI with a focus on modeling human geometry and behaviors for
lifelike robots.
My past research spans 3D reconstruction and rendering, human pose estimation, generative models,
SLAM, and telepresence systems.
During my Ph.D. study on telepresence systems at the University of North Carolina at Chapel Hill, I was fortunate to have been advised by
professor Henry Fuchs.
Before that, I worked with professor Derek Hoiem at his startup Reconstruct.
I obtained my Master of Science in Computer Vision (MSCV) from Carnegie Mellon University, where I worked with professor Michael Kaess on SLAM systems.
I obtained my Bachelor of Science in Computer Engineering from University of Illinois at Urbana Champaign.
Single-image human mesh recovery that accurately estimates perspective parameters (including Tz and focal length) from a single image, delivering SOTA pose accuracy and 2D alignment—especially for close-range views.
Fusion-based 3D portrait method that combines a canonical 3D prior with per-frame appearance to achieve temporally stable, identity-consistent 3D videos for single-camera telepresence.
Personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity.
Journal version of your affordable desktop telepresence system (Bringing Telepresence to Every Desk, 2023): 4 consumer RGB-D cameras and a renderer that synthesizes high-quality free-viewpoint videos of both user and environment.
We present an AI-mediated 3D video conferencing system that can reconstruct and autostereoscopically display a life-sized talking head using consumer-grade compute resources and minimal capture equipment. Our 3D capture uses a novel 3D lifting method that encodes a given 2D input into an efficient triplanar neural representation of the user, which can be rendered from novel viewpoints in real-time. Our AI-based techniques drastically reduce the cost for 3D capture, while providing a high-fidelity 3D representation on the receiver's end at the cost of traditional 2D video streaming. Additional advantages of our AI-based approach include the ability to accommodate both photorealistic and stylized avatars, and the ability to enable mutual eye contact in multi-directional video conferencing. We demonstrate our system using a tracked stereo display for a personal viewing experience as well as a light field display for a multi-viewer experience.
We showcase a prototype personal telepresence system. With 4 RGBD cameras, it can be easily
installed on every desk. Our renderer synthesizes high-quality free-viewpoint videos of the
entire scene and outperforms prior neural rendering methods.
We discovered a natural information partition in 2D/3D MLPs, which stores structural
information in early layers and color information in later layers.
We leverage this property to incrementally stream dynamic free-viewpoint videos without buffering
(required by prior dynamic NeRFs).
With the significant reduction in training time and bandwidth, we lay foundation for live-streaming
NeRF and better understanding of MLPs.
We introduce a system that synthesizes dynamic free-viewpoint videos from 2 RGBD cameras. This
is a preliminary work to our personal telepresence system.