LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds

The proposed LiDARHuman26M benchmark dataset consists of synchronous LiDAR point clouds, RGB images, and ground-truth 3D human motions obtained from professional IMU devices, covering diverse motions and a large capture distance ranging. Based on LiDARHuman26M, we propose LiDARCap, a strong baseline motion capture approach on LiDAR point clouds, which achieves promising results as shown on the right end.

Abstract

Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications. We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation. Our dataset also includes the ground truth human motions acquired by the IMU system and the synchronous RGB images. We further present a strong baseline method, LiDARCap, for LiDAR point cloud human motion capture. Specifically, we first utilize PointNet++ to encode features of points and then employ the inverse kinematics solver and SMPL optimizer to regress the pose through aggregating the temporally encoded features hierarchically. Quantitative and qualitative experiments show that our method outperforms the techniques based only on RGB images. Ablation experiments demonstrate that our dataset is challenging and worthy of further research. Finally, the experiments on the KITTI Dataset and the Waymo Open Dataset show that our method can be generalized to different LiDAR sensor settings.

Result

Dataset

Download URL

Google Drive/BaiduNetDisk(Access Code: o3xq)

Structure

lidarhuman26M
|── images
|   |── 5
|   |   |── 000001.png
|   |   |── 000002.png
|   |   |── ...
|   |── ...
|   |── 42
|── labels/3d
    |── segment
    |   |── 5
    |   |   |── 000001.ply
    |   |   |── 000002.ply
    |   |   |── ...
    |   |── ...
    |   |── 42
    |── pose
        |── 5
        |   |── 000001.json
        |   |── 000002.json
        |   |── ...
        |── ...
        |── 42

Specification

Point clouds that contains only the volunteers are stored in lidarhuman26M/labels/3d/segment
The ground truth 3D human motions are provided in the form of SMPL parameters(pose, shape and trans) in lidarhuman26M/labels/3d/pose. If you want to generate the corresponding mesh, you can use the smplx, a Python module whose specification can be found here.
Because of the limited space, we only provide the volunteer part of the images in png format in lidarhuman26M/images. If you want to project the point clouds onto the images, you can use the code below.

from plyfile import PlyData

import json
import numpy as np
import os
import torch


def affine(X, matrix):
    n = X.shape[0]
    if type(X) == np.ndarray:
        res = np.concatenate((X, np.ones((n, 1))), axis=-1).T
        res = np.dot(matrix, res).T
    else:
        res = torch.cat((X, torch.ones((n, 1)).to(X.device)), axis=-1)
        res = matrix.to(X.device).matmul(res.T).T
    return res[..., :-1]


def lidar_to_camera(X, extrinsic_matrix):
    return affine(X, extrinsic_matrix)


def camera_to_pixel(X, intrinsic_matrix, distortion_coefficients):
    # focal length
    f = np.array([intrinsic_matrix[0, 0], intrinsic_matrix[1, 1]])
    # center principal point
    c = np.array([intrinsic_matrix[0, 2], intrinsic_matrix[1, 2]])
    k = np.array([distortion_coefficients[0],
                 distortion_coefficients[1], distortion_coefficients[4]])
    p = np.array([distortion_coefficients[2], distortion_coefficients[3]])
    XX = X[..., :2] / X[..., 2:]
    r2 = np.sum(XX[..., :2]**2, axis=-1, keepdims=True)

    radial = 1 + np.sum(k * np.concatenate((r2, r2**2, r2**3),
                        axis=-1), axis=-1, keepdims=True)

    tan = 2 * np.sum(p * XX[..., ::-1], axis=-1, keepdims=True)
    XXX = XX * (radial + tan) + r2 * p[..., ::-1]
    return f * XXX + c


def read_point_cloud(filename):
    """ read XYZ point cloud from filename PLY file """
    ply_data = PlyData.read(filename)['vertex'].data
    points = np.array([[x, y, z] for x, y, z in ply_data])
    return points



def project_points_on_segment_image(index):

    extrinsic_matrix = np.array([-0.0043368991524, -0.99998911867, -0.0017186757713, 0.016471385748, -0.0052925495236, 0.0017416212982, -
                                0.99998447772, 0.080050847871, 0.99997658984, -0.0043277356572, -0.0053000451695, -0.049279053295, 0, 0, 0, 1]).reshape(4, 4)
    intrinsic_matrix = np.array([9.5632709662202160e+02, 0., 9.6209910493679433e+02,
                                0., 9.5687763573729683e+02, 5.9026610775785059e+02, 0., 0., 1.]).reshape(3, 3)
    distortion_coefficients = np.array([-6.1100617222502205e-03, 3.0647823796371827e-02, -
                                        3.3304524444662654e-04, -4.4038460096976607e-04, -2.5974982760794661e-02])

    dataset_folder = '/path/to/lidarhuman26M'
    with open(os.path.join(dataset_folder, 'lidarhuman26M_top_left.json')) as f:
        data = json.load(f)

    image_filename = os.path.join(
        dataset_folder, 'images/{}.png'.format(index))
    point_cloud_filename = os.path.join(
        dataset_folder, 'labels/3d/segment/{}.ply'.format(index))
    point_cloud = read_point_cloud(point_cloud_filename)
    points_on_image = camera_to_pixel(lidar_to_camera(
        point_cloud, extrinsic_matrix), intrinsic_matrix, distortion_coefficients)
    top_left_coord = np.array(data[index])
    points_on_image -= top_left_coord
    return points_on_image

if __name__ == '__main__':
    project_points_on_segment_image('35/000423')

License

Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use the LiDARHuman26M Dataset(hereinafter the “Dataset”). By downloading and/or using the Dataset, you acknowledge that you have read these terms and conditions, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Dataset. Any infringement of the terms of this agreement will automatically terminate your rights under this license.

Ownership

The Dataset and the associated materials has been developed by spAital Sensing and Computing Lab, Xiamen University and Shanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University(hereinafter the “Licensor”).

Grant

Licensor grants you (Licensee) personally a single-user, non-exclusive, non-transferable, free of charge right:

To obtain and install the Dataset on computers owned, leased or otherwise controlled by you and/or your organization;
To use the Dataset for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects;

Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes. The Dataset may not be reproduced, modified and/or made available in any form to any third party without Licensor’s prior written permission.

This license prohibits the use of the Dataset to train methods/algorithms/neural networks/etc. for commercial use of any kind. By downloading the Dataset, you agree not to reverse engineer it.

No distribution

The Dataset and the license herein granted shall not be copied, shared, distributed, re-sold, offered for re-sale, transferred or sub-licensed in whole or in part except that you may make one copy for archive purposes only.

No warranty

The authors do not warrant the quality, accuracy, or completeness of any information, data or software provided. Such data and software is provided “AS IS” without warranty or condition of any nature. The authors disclaim all other warranties, expressed or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with respect to the data and any accompanying materials.

Restriction and limitation of liability

In no event shall the authors be liable for any other damages whatsoever arising out of the use of, or inability to use this dataset and its associated software, even if the authors have been advised of the possibility of such damages.

Responsible use

It is YOUR RESPONSIBILITY to ensure that your use of this product complies with these terms and pay any additional fees or royalties, as may be required, for any uses not permitted or not specified in this agreement.

Acceptance of this agreement

Any use whatsoever of this dataset and its associated software shall constitute your acceptance of the terms of this agreement. By using the dataset and its associated software, you agree to cite the papers of the authors, in any of your publications by you and your collaborators that make any use of the dataset, in the following format (NOTICE THAT CITING THE DATASET URL INSTEAD OF THE PUBLICATIONS, WOULD NOT BE COMPLIANT WITH THIS LICENSE AGREEMENT):

@inproceedings{li2022lidarcap,
  title={LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds},
  author={Li, Jialian and Zhang, Jingyi and Wang, Zhiyong and Shen, Siqi and Wen, Chenglu and Ma, Yuexin and Xu, Lan and Yu, Jingyi and Wang, Cheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={20502--20512},
  year={2022}
}

Further information and commercial licensing

For further information, or for commercial licensing, please contact us at the following email address: cwang@xmu.edu.cn

Citation

@inproceedings{li2022lidarcap,
  title={LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds},
  author={Li, Jialian and Zhang, Jingyi and Wang, Zhiyong and Shen, Siqi and Wen, Chenglu and Ma, Yuexin and Xu, Lan and Yu, Jingyi and Wang, Cheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={20502--20512},
  year={2022}
}

Share on

Twitter Facebook LinkedIn