Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Abstract

Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data.

We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360° cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion.

We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.

Dataset

The Helvipad dataset includes of 39,553 labeled frames from indoor and outdoor scenes under various lighting conditions.

The equipment setup of our data acquisition includes:

2 Ricoh Theta V cameras, capturing images in 4K/UHD equirectangular format with an initial size of 3840 × 1920 pixels at 30 fps, mounted in a top-bottom arrangement with a 19.1 cm baseline between them.
Ouster OS1-64 LiDAR Sensor, providing 64 beams, a vertical field of view of 45°, and capable of measuring depths from 0 to 120 meters at 10 fps, mounted 45.0 cm below the bottom camera.
Nvidia Jetson Xavier, serving as the central processor to manage data capture and ensure synchronization across all devices during data collection.

Data was extracted from video sequences captured between December 2023 and February 2024. Each sequence is synchronized with its corresponding LiDAR point clouds, which are projected on frames to obtain depth maps and disparity maps.

Histogram of Depth Values - All Scenes — Depth Distribution - All

Histogram of Depth Values - Indoor Scenes — Depth Distribution - Indoor

Histogram of Depth Values - Outdoor Scenes — Depth Distribution - Outdoor

Depth values range from 0.5 to 225 meters, with averages of 8.1 meters overall, 5.4 meters for indoor scenes, and 9.2 meters for combined day and night outdoor scenes.

Benchmark Results

We evaluate the performance of multiple state-of-the-art and popular stereo matching methods, both for standard and 360° images. All models are trained on a single NVIDIA A100 GPU with the largest possible batch size to ensure comparable use of computational resources.

Method	Stereo Setting	Disparity (°)			Depth (m)
Method	Stereo Setting	MAE	RMSE	MARE	MAE	RMSE	MARE	LRCE
PSMNet	conventional	0.286	0.496	0.248	2.509	5.673	0.176	1.809
360SD-Net	omnidirectional	0.224	0.419	0.191	2.122	5.077	0.152	0.904
IGEV-Stereo	conventional	0.225	0.423	0.172	1.860	4.474	0.146	1.203
360-IGEV-Stereo	omnidirectional	0.188	0.404	0.146	1.720	4.297	0.130	0.388
DFI-OmniStereo	omnidirectional	0.158	0.338	0.120	1.463	3.767	0.108	0.397

The dataset is also an ideal testbed for assessing the robustness of depth estimation methods to diverse lighting conditions and depth ranges by training and evaluating models on different subsets of the dataset (e.g., indoor vs. outdoor scenes).

Download

Use the link below to access the dataset on HuggingFace Hub.

Download Dataset

The dataset is organized into training, validation and testing subsets with the following structure:

helvipad/
├── train/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data
├── val/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data
├── test/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion (only for computing LRCE)
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion (only for computing LRCE)
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data

BibTeX

If you use the Helvipad dataset in your research, please cite it using the following BibTeX entry:

@inproceedings{zayene2025helvipad,
  author    = {Zayene, Mehdi and Endres, Jannik and Havolli, Albias and Corbière, Charles and Cherkaoui, Salim and Ben Ahmed Kontouli, Alexandre and Alahi, Alexandre},
  title     = {Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025}
}