Abstract

We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects’ activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 200 square meters (up to 30,000 square meters), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released here.

Pipeline and Hardware system

Pipeline

Hardware

Dataset

  • 15 sequences of 12 human subjects in 10 scenes in urban environments (1k – 30k $m^2$)
  • 100k+ frames multi-source data (20 Hz), including 2D / 3D annotations and 3D scenes; 7 km+ human motions.

Every human subject signed permission to release their motion data for research purposes.

2D / 3D annotations and point clouds.

Human motions in reconstructed 3D scenes

Qualitative Comparison

Comparison between IMU + ICP and our results

Comparison between IMU + ICP and our results

Comparison between original extrinsic and ours.

Cross-Dataset Evaluation

LiDAR-based human pose estimation (HPE)

Camera-based human pose estimation (HPE)

Global Human Pose Estimation Comparison

Citation

@InProceedings{Dai_2023_CVPR,
    author    = {Dai, Yudi and Lin, Yitai and Lin, Xiping and Wen, Chenglu and Xu, Lan and Yi, Hongwei and Shen, Siqi and Ma, Yuexin and Wang, Cheng},
    title     = {SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {682-692}
}