Kernel Transformer Networks for Compact Spherical Convolution
In our prior work, we propose spherical convolutional neural network that can transfer off-the-shelf CNNs to 360° images for visual recognition. However, spherical convolutional neural network increases the model size significantly, which makes the model hard to train and deploy. In this work, we propose the Kernel Transformer Network that learns a function that transforms a kernel to account for the distortion in the equirectangular projection of 360° images. The transformation formulation can greatly reduce the model size for spherical convolution and can transfer to multiple source CNNs for multiple recognition tasks.
[top]Instead of learning spherical convolution kernels for the distorted visual content, we can learn a function transformation that can account for the distortion and generate the desirable spherical convolution kernels from the source kernel.
It can be considered as a generalization of the convolution operation, where the kernel is a function of the location as well as the source kernel.
Because KTN takes the source model as input, it can transfer multiple source CNNs with the same architecture to 360° images for different visual recognition tasks.