3D human motions with accurate localization in diverse, challenging scenes. The top line shows a person walking from indoor to outdoor. The bottom row figures show the challenge cases, where the bottom right figure is the rock climbing's third-view from the camera.

Abstract

We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments. Using only body-mounted IMUs and LiDAR, HSC4D is space-free without any external devices’ constraints and map-free without pre-built maps. Considering that IMUs can capture human poses but always drift for long-period use, while LiDAR is stable for global localization but rough for local positions and orientations, HSC4D makes both sensors complement each other by a joint optimization and achieves promising results for long-term capture. Relationships between humans and environments are also explored to make their interaction more realistic. To facilitate many down-stream tasks, like AR, VR, robots, autonomous driving, etc., we propose a dataset containing three large scenes (1k-5k $m^2$) with accurate dynamic human motions and locations. Diverse scenarios (climbing gym, multi-story building, slope, etc.) and challenging human activities (exercising, walking up/down stairs, climbing, etc.) demonstrate the effectiveness and the generalization ability of HSC4D. The dataset and code is available at this https URL.

Pipeline

The inputs are point cloud sequence and IMUs data. IMUs pose estimation and LiDAR mapping are performed, respectively. Synchronization and calibration are applied to prepare data for further optimization. Finally, graph-based optimization and joint optimization are performed to produce the global motion in the scene map.

Capturing system

Our lightweight wearable capturing system includes 17 body-attached IMUs and a hip-mounted LiDAR. LiDAR packages are connected to a DJI Manifold2-C mini-computer. We tilt up the LiDAR for 30° to get a good vertical scanning view.

Video

Citation

@InProceedings{Dai_2022_CVPR,
    author    = {Dai, Yudi and Lin, Yitai and Wen, Chenglu and Shen, Siqi and Xu, Lan and Yu, Jingyi and Ma, Yuexin and Wang, Cheng},
    title     = {HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {6792-6802}
}