I'm a Research Scientist at NVIDIA Research working on 3D vision, world models, telepresence, and socially intelligent robots.
I was advised by Henry Fuchs during my PhD on affordable telepresence at UNC Chapel Hill.
I worked with Michael Kaess on SLAM when I was in the MSCV program at CMU.
I worked with Derek Hoiem on classical 3D reconstruction during my BS at UIUC and when I was at his company Reconstruct Inc.
Honors:
IEEE VR Best Paper Honorable Mention
ICRA Outstanding Navigation Paper
Industry: Reconstruct Inc. | Uber ATG | Intel Labs | NVIDIA Research
Research:
2017–2020
SLAM & classical 3D reconstruction
2020–2025
Telepresence, neural rendering, human pose estimation, generative models
2025–present
Video world models, socially intelligent robots (new papers soon)
Single-image human mesh recovery that accurately estimates perspective parameters (including Tz and focal length) from a single image, delivering SOTA pose accuracy and 2D alignment—especially for close-range views.
Fusion-based 3D portrait method that combines a canonical 3D prior with per-frame appearance to achieve temporally stable, identity-consistent 3D videos for single-camera telepresence.
Personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity.
Journal version of your affordable desktop telepresence system (Bringing Telepresence to Every Desk, 2023): 4 consumer RGB-D cameras and a renderer that synthesizes high-quality free-viewpoint videos of both user and environment.
We present an AI-mediated 3D video conferencing system that can reconstruct and autostereoscopically display a life-sized talking head using consumer-grade compute resources and minimal capture equipment. Our 3D capture uses a novel 3D lifting method that encodes a given 2D input into an efficient triplanar neural representation of the user, which can be rendered from novel viewpoints in real-time. Our AI-based techniques drastically reduce the cost for 3D capture, while providing a high-fidelity 3D representation on the receiver's end at the cost of traditional 2D video streaming. Additional advantages of our AI-based approach include the ability to accommodate both photorealistic and stylized avatars, and the ability to enable mutual eye contact in multi-directional video conferencing. We demonstrate our system using a tracked stereo display for a personal viewing experience as well as a light field display for a multi-viewer experience.
We showcase a prototype personal telepresence system. With 4 RGBD cameras, it can be easily
installed on every desk. Our renderer synthesizes high-quality free-viewpoint videos of the
entire scene and outperforms prior neural rendering methods.
We discovered a natural information partition in 2D/3D MLPs, which stores structural
information in early layers and color information in later layers.
We leverage this property to incrementally stream dynamic free-viewpoint videos without buffering
(required by prior dynamic NeRFs).
With the significant reduction in training time and bandwidth, we lay foundation for live-streaming
NeRF and better understanding of MLPs.
We introduce a system that synthesizes dynamic free-viewpoint videos from 2 RGBD cameras. This
is a preliminary work to our personal telepresence system.