Learning Compressible 360° Video Isomers

We propose to improve 360° video compression by selecting a proper orientation for cubemap projection. Our key insight is that different cubemap orientations lead to different compression rate for the same 360° video using the same video codec. We perform a detailed analysis on 80 360° videos with 3 hours total length to verify that the orientation of cubemap projection is important for the ultimate video size. The results show scope for reducing video sizes by up to 75% through rotation, and the average reduction is more than 8% across all videos.

[top]

360° Video Isomers Analysis

We enumerate the cubemap orientation along two rotation axes:

The cubemaps are rendered with transform360 and encoded with x264, x265, and libvpx losslessly. We use a fixed 2s GOP and encode each GOP with an independent orientation. We then define the achievable size reduction through rotation as:

We compute the size reduction over 80 360° videos with 3 hours total length. The videos are crawled from YouTube and are encoded with H264 High Profile with 4K resolution. The average and range of achievable size reduction are:

Because the compression rate depends on the visual content and resulting cubemap representation, the video size distribution w.r.t. orientation varies across different videos.

[top]

Predict Compressible Isomer

Enumerate all possible cubemap orientations requires us to encode the video repeatedly and is computationally prohibitive. Instead, we propose to predict the optimal orientation from video content using a Convolutional Neural Network.

Input: instead of taking pixel input, we extract 1) segmentation contours and 2) motion vectors as input for the CNN
Skip connections: because fine image details are important for video compression algorithms, we use skip connection to pass detail information to the final predictor
Objective: instead of predicting the optimal orientation, we predict the bit-stream size at each candidate orientation and choose the one with minimum predicted video size as final output

Based on the prediction model, we propose a new two-stage compression pipeline for 360° videos.

[top]

Dataset

The encoded video size is stored in Pandas DataFrame. The columns correspond to the Youtube video id and segment id. The segment length is two seconds, so the start and end of each segment is [2*id, 2*id+2]. The rows correspond to the two orientations (yaw, pitch) of the cubemap. The file size is represented in byte. The HDF5 file contains three datasets: h264, hevc, and vp9.
[HDF5]

[top]

Publication

Yu-Chuan Su, Kristen Grauman, "Learning Compressible 360° Video Isomers," CVPR 2018
[arXiv] [poster] [dataset]

[top]