Learning Spherical Convolution for Fast Features from 360° Imagery

Concept figure

We propose a generic approach that can transfer Convolutional Nerual Networks that has been trained on perspective images to 360° images. Our solution entails a new form of distillation across camera projection models. Compared to current practices for feature extraction on 360° images, spherical convolution benefits efficiency by avoiding performing multiple perspective projections, and it benefits accuracy by adapting kernels to the distortions in equirectangular projection.

[top]

Preliminary


Existing strategies for applying off-the-shelf CNNs on 360° images are problematic.

Strategy I

Strategy I

Strategy II

Strategy II

Spherical CNN

Many works try to learn new CNNs on spherical data. However, they require annotated training data in spherical format and cannot exploit existing datasets and models even for the very same task.

[top]

Spherical Convolution


Objective

We learn the spherical convolutional network to reproduce the exact outputs of the source model on the perspective projected images while taking the equirectangular projection as input.

Objective

Network Architecture

Because the distortion in equirectangular projection is location detendent, we untie the kernel weights along the rows. The kernels learn to account for the different distortions they encountered.

Equirectangular distortion

Layer-wise Training

We propose a layer-wise training procedure to accelerate learning. By requiring the spherical convolutional network to reproduce all intermediate outputs of the source model, each layer of the network becomes independent and can be trained separately.

Layer-wise training
[top]

Results


To evaluate the method, we apply SphConv to off-the-shelf Faster R-CNN. We train the model on the Pano2Vid 360° video dataset and evaluate on the Pano2Vid and spherical Pascal VOC 2007 datasets.

Results

Example Outputs

005976
006500
008428
009258
[top]

Publication


[top]