Shengze Wang

I'm a Research Scientist at NVIDIA Research working on 3D vision, world models, telepresence, and socially intelligent robots. I was advised by Henry Fuchs during my PhD on affordable telepresence at UNC Chapel Hill. I worked with Michael Kaess on SLAM when I was in the MSCV program at CMU. I worked with Derek Hoiem on classical 3D reconstruction during my BS at UIUC and when I was at his company Reconstruct Inc.

Honors:
  • IEEE VR Best Paper Honorable Mention
  • ICRA Outstanding Navigation Paper

Industry: Reconstruct Inc.  |  Uber ATG  |  Intel Labs  |  NVIDIA Research

Research:
2017–2020SLAM & classical 3D reconstruction
2020–2025Telepresence, neural rendering, human pose estimation, generative models
2025–presentVideo world models, socially intelligent robots (new papers soon)
Service:

News

Jun 3, 2026Interactive Physical AI (IPA) Workshop at CVPR 2026.
May 9, 2026Hooded at UNC commencement.
Sep 2, 2025Joined NVIDIA Research as Research Scientist.
Jul 18, 2025Defended PhD thesis at UNC Chapel Hill.
Jun 2025Outstanding Reviewer, CVPR 2025.
Mar 2025★ Best Paper Honorable Mention, IEEE VR 2025 — MNAF.
Feb 2025BLADE and Coherent3D accepted at CVPR 2025.
Jan 2025Learning View Synthesis for Desktop Telepresence With Few RGBD Cameras accepted at IEEE TVCG.
Oct 2024My3DGen accepted as an ★ Oral Presentation at WACV 2025.
Aug 2023Demonstrated AI-Mediated 3D Video Conferencing at SIGGRAPH Emerging Technologies.
May 2022★ Outstanding Navigation Paper Award, ICRA 2022 — EDPLVO.
2022PLC-LiSLAM accepted at IEEE RA-L.
2021DPLVO and π-LSAM accepted at ICRA 2021.
2020A Fast and Accurate Solution for Pose Estimation from 3D Correspondences accepted at ICRA 2020.
profile photo

mct1224 [at] gmail.com
Google Scholar  /  Twitter
Music (Soundcloud)  /  网易云

Research
BLADE teaser BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation
Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano*, Shalini De Mello*, Michael Stengel*
CVPR, 2025   [Project Page]   [CVPR]   [Code]   [Weights & File Preparation]

Single-image human mesh recovery that accurately estimates perspective parameters (including Tz and focal length) from a single image, delivering SOTA pose accuracy and 2D alignment—especially for close-range views.

Coherent3D teaser Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion
Shengze Wang, Xueting Li, Chao Liu, Matthew Chan, Michael Stengel, Henry Fuchs, Shalini De Mello*, Koki Nagano*,
CVPR, 2025   [Project Page]   [CVPR]

Fusion-based 3D portrait method that combines a canonical 3D prior with per-frame appearance to achieve temporally stable, identity-consistent 3D videos for single-camera telepresence.

MNAF teaser Multimodal Neural Acoustic Fields for Immersive Mixed Reality (IEEE VRBest Paper Honorable Mention Award)
Guansen Tong, Johnathan Chi-Ho Leung, Xi Peng, Haosheng Shi, Liujie Zheng, Shengze Wang, Arryn Carlos O’Brien, Ashley Paula-Ann Neall, Grace Fei, Martim Gaspar, Praneeth Chakravarthula
IEEE TVCG, 2025   [TVCG]

Visual-acoustic fusion (“conformer”) + acoustic synthesis to learn scene acoustics and render spatial audio from novel viewpoints; improves immersive MR audio quality.

My3DGen teaser My3DGen: A Scalable Personalized 3D Generative Model (Oral Presentation)
Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta
WACV, 2025   [Project Page]   [arXiv]

Personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity.

Learning View Synthesis for Desktop Telepresence With Few RGBD Cameras
Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liuejie Zheng, YoungJoong Kwon, Roni Sengupta, Henry Fuchs
IEEE TVCG, 2025   [Project Page]   [arXiv]

Journal version of your affordable desktop telepresence system (Bringing Telepresence to Every Desk, 2023): 4 consumer RGB-D cameras and a renderer that synthesizes high-quality free-viewpoint videos of both user and environment.

AI-Mediated 3DVC teaser AI-Mediated 3D Video Conferencing
Michael Stengel, Koki Nagano, Chao Liu, Matthew Chan, Alex Trevithick, Shalini De Mello, Jonghyun Kim, David Luebke, Amrita Mazumdar, Shengze Wang Mayoore Jaiswal
SIGGRAPH Emerging Technologies, 2023   [Project Page]   [NVIDIA YouTube]   [ACM DL]

We present an AI-mediated 3D video conferencing system that can reconstruct and autostereoscopically display a life-sized talking head using consumer-grade compute resources and minimal capture equipment. Our 3D capture uses a novel 3D lifting method that encodes a given 2D input into an efficient triplanar neural representation of the user, which can be rendered from novel viewpoints in real-time. Our AI-based techniques drastically reduce the cost for 3D capture, while providing a high-fidelity 3D representation on the receiver's end at the cost of traditional 2D video streaming. Additional advantages of our AI-based approach include the ability to accommodate both photorealistic and stylized avatars, and the ability to enable mutual eye contact in multi-directional video conferencing. We demonstrate our system using a tracked stereo display for a personal viewing experience as well as a light field display for a multi-viewer experience.

Bringing Telepresence to Every Desk
Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liuejie Zheng, YoungJoong Kwon, Roni Sengupta, Henry Fuchs
arXiv, 2023 [Project Page][Code]

We showcase a prototype personal telepresence system. With 4 RGBD cameras, it can be easily installed on every desk. Our renderer synthesizes high-quality free-viewpoint videos of the entire scene and outperforms prior neural rendering methods.

INV: Towards Streaming Incremental Neural Videos
Shengze Wang, Alexey Supikov, Joshua Ratcliff, Henry Fuchs, Ronald Azuma
arXiv, 2023

We discovered a natural information partition in 2D/3D MLPs, which stores structural information in early layers and color information in later layers. We leverage this property to incrementally stream dynamic free-viewpoint videos without buffering (required by prior dynamic NeRFs).

With the significant reduction in training time and bandwidth, we lay foundation for live-streaming NeRF and better understanding of MLPs.

PLC-LiSLAM: LiDAR SLAM With Planes, Lines, and Cylinders
Lipu Zhou, Guoquan Huang, Yinian Mao, Jincheng Yu, Shengze Wang, Michael Kaess
IEEE RA-L, 2022
EDPLVO: Efficient direct point-line visual odometry (Outstanding Navigation Paper)
Lipu Zhou, Guoquan Huang, Yinian Mao, Shengze Wang, Michael Kaess
ICRA, 2022
Learning Dynamic View Synthesis With Few RGBD Cameras
Shengze Wang, YoungJoong Kwon, Yuan Shen, Qian Zhang, Andrei State, Jia-Bin Huang, Henry Fuchs
arXiv, 2022

We introduce a system that synthesizes dynamic free-viewpoint videos from 2 RGBD cameras. This is a preliminary work to our personal telepresence system.

DPLVO: direct point-line monocular visual odometry
Lipu Zhou, Shengze Wang, Michael Kaess
ICRA, 2021
π-LSAM: LiDAR Smoothing and Mapping With Planes
Lipu Zhou, Shengze Wang, Michael Kaess
ICRA, 2021
A fast and accurate solution for pose estimation from 3D correspondences
Lipu Zhou, Shengze Wang, Michael Kaess
ICRA, 2020
Do not Omit Local Minimizer: a Complete Solution for Pose Estimation from 3D Correspondences
Lipu Zhou, Shengze Wang, Jiamin Ye, Michael Kaess
arXiv, 2019
Unsupervised Learning of Monocular Depth Estimation with Bundle Adjustment, Super-Resolution and Clip Loss
Lipu Zhou, Jiamin Ye, Montiel Abello, Shengze Wang, Michael Kaess
arXiv, 2018

Thank you to Jon Barron for sharing his website template with the community