Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

1École Polytechnique Fédérale de Lausanne (EPFL), 2TU Darmstadt
*Project Lead

Abstract

Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data.

We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360° cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion.

We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.

Dataset

The Helvipad dataset includes of 39,553 labeled frames from indoor and outdoor scenes under various lighting conditions.

Dataset visualisations

The equipment setup of our data acquisition includes:

  • 2 Ricoh Theta V cameras, capturing images in 4K/UHD equirectangular format with an initial size of 3840 × 1920 pixels at 30 fps, mounted in a top-bottom arrangement with a 19.1 cm baseline between them.
  • Ouster OS1-64 LiDAR Sensor, providing 64 beams, a vertical field of view of 45°, and capable of measuring depths from 0 to 120 meters at 10 fps, mounted 45.0 cm below the bottom camera.
  • Nvidia Jetson Xavier, serving as the central processor to manage data capture and ensure synchronization across all devices during data collection.
LiDAR to 360° Mapping Illustration

Data was extracted from video sequences captured between December 2023 and February 2024. Each sequence is synchronized with its corresponding LiDAR point clouds, which are projected on frames to obtain depth maps and disparity maps.

Histogram of Depth Values - All Scenes
Depth Distribution - All
Histogram of Depth Values - Indoor Scenes
Depth Distribution - Indoor
Histogram of Depth Values - Outdoor Scenes
Depth Distribution - Outdoor

Depth values range from 0.5 to 225 meters, with averages of 8.1 meters overall, 5.4 meters for indoor scenes, and 9.2 meters for combined day and night outdoor scenes.

Benchmark Results

We evaluate the performance of multiple state-of-the-art and popular stereo matching methods, both for standard and 360° images. All models are trained on a single NVIDIA A100 GPU with the largest possible batch size to ensure comparable use of computational resources.

Method Type Disparity (°) Depth (m)
MAE RMSE MARE MAE RMSE MARE
PSMNet stereo 0.33 0.54 0.20 2.79 6.17 0.29
360SD-Net 360° stereo 0.21 0.42 0.18 2.14 5.12 0.15
IGEV-Stereo stereo 0.22 0.41 0.17 1.85 4.44 0.15
360-IGEV-Stereo 360° stereo 0.18 0.39 0.15 1.77 4.36 0.14

The dataset is also an ideal testbed for assessing the robustness of depth estimation methods to diverse lighting conditions and depth ranges by training and evaluating models on different subsets of the dataset (e.g., indoor vs. outdoor scenes).

Cross-Scene Generalization Performance
Cross-Scene Generalization Performance

Download

Use the link below to access the dataset on HuggingFace Hub.

The dataset is organized into training and testing subsets, whose structure is outlined below:

helvipad/
├── train/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data
├── test/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data

BibTeX

If you use the Helvipad dataset in your research, please cite it using the following BibTeX entry:

@misc{zayene2024helvipad,
  author        = {Zayene, Mehdi and Endres, Jannik and Havolli, Albias and Corbi\`{e}re, Charles and Cherkaoui, Salim and Ben Ahmed Kontouli, Alexandre and Alahi, Alexandre},
  title         = {Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation},
  year          = {2024},
  eprint        = {2403.16999},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}