Making 360° Video Watchable in 2D:

Learning Videography for Click Free Viewing

Concept figure

In our prior work, we propose the Pano2Vid problem that aims to generate normal-field-of-view (NFOV) videos that look like human captured given a 360° video. We propose the AutoCam algorithm that solves the Pano2Vid problem by learning to control virtual camera within 360° video axis from human captured NFOV videos.

In this work, we propose three improvements over the AutoCam algorithm. First, we generalize the task of Pano2Vid to allow changes in the field-of-view (FOV), i.e. zooming, which is a commonly used techniqe in videography. Second, we present a coarse-to-fine trajectory search algorithm that iteratively refines the camera control while reducing the search space to improve the computational efficiency. Finally, we generate a diverse set of output videos given an input 360° video to account for the fact that valid Pano2Vid solutions are often multimodal.

[top]

Improvements over AutoCam


Zoom Lens

The new algorithm enables zooming in virtual camera control. Zooming not only makes the camera control more natural but also improves the quality of capture-worthiness score.

Zoom Lens

Coarse to Fine Trajectory Search

The AutoCam algorithm finds the camera trajectories over all candidate glimpses. It has to process all glimpses, which is computationally intensive. The new algorithm imporves computational efficiency by using a two stage trajectory search algorithm to avoid processing all glimpses.

Coarse to Fine Search

Diverse Trajectory Search

The original AutoCam algorithm may generate redundant outputs such that the camera trajectories are almost identical. The new algorithm search the trajectories iteratively and encourage the diversity between outputs.

Diverse Trajectory Search
[top]

Video Examples


Zooming allows the algorithm to emphasize particular object and moment in the video.

This example shows why it is important to generate diverse trajectories from the same input video. The two outputs demonstrate different ways to capture a video in the same scene.

These two examples show that zooming helps to learn a better capture-worthiness model. The content captured by the new algorithm is more interesting in both cases.

Failure Cases

In this example, the algorithm focus on the videographers, but the players on the playground should be more importatnt than the videographers. The algorithm does not reason about the importance of different objects in the scene, and further information is necessary to solve the problem.

Annotation Interface

These are two trajectories annotated by the same editor. Note the orientation of the 360° video is shifted by 180° in the second example. This encourage the editor to annotate different trajectories and avoid the bias introduced by the interface.

[top]

Publication


[top]