Kernel Transformer Networks for Compact Spherical Convolution

In our prior work, we propose spherical convolutional neural network that can transfer off-the-shelf CNNs to 360° images for visual recognition. However, spherical convolutional neural network increases the model size significantly, which makes the model hard to train and deploy. In this work, we propose the Kernel Transformer Network that learns a function that transforms a kernel to account for the distortion in the equirectangular projection of 360° images. The transformation formulation can greatly reduce the model size for spherical convolution and can transfer to multiple source CNNs for multiple recognition tasks.

[top]

Kernel Transformer Network

Idea

Instead of learning spherical convolution kernels for the distorted visual content, we can learn a function transformation that can account for the distortion and generate the desirable spherical convolution kernels from the source kernel.

It can be considered as a generalization of the convolution operation, where the kernel is a function of the location as well as the source kernel.

Transferability

Because KTN takes the source model as input, it can transfer multiple source CNNs with the same architecture to 360° images for different visual recognition tasks.

Architecture

KTN uses a projection (channel-wise) operation to resize the input kernel to a desirable shape.
KTN then uses a ResNet-like architecture to generate the target kernel.
The transformation is trained using the same objective as spherical convolutional network.

[top]

Results

[top]

Publication

Yu-Chuan Su, Kristen Grauman, "Kernel Transformer Networks for Compact Spherical Convolution," CVPR 2019
[arXiv] [poster] [code] [model]

[top]