Researchers from University of Washington and Facebook used deep learning to convert still images into realistic animated looping videos. Their approach, which will be presented at the upcoming Conference on Computer Vision and Pattern Recognition (CVPR), imitates continuous fluid motion — such as flowing water, smoke and clouds — to turn still images into short … Continued
Researchers from University of Washington and Facebook used deep learning to convert still images into realistic animated looping videos.
Their approach, which will be presented at the upcoming Conference on Computer Vision and Pattern Recognition (CVPR), imitates continuous fluid motion — such as flowing water, smoke and clouds — to turn still images into short videos that loop seamlessly.
“What’s special about our method is that it doesn’t require any user input or extra information,” said Aleksander Hołyński, University of Washington doctoral student in computer science and engineering and lead author on the project. “All you need is a picture. And it produces as output a high-resolution, seamlessly looping video that quite often looks like a real video.”
The team created a method known as “symmetric splatting” to predict the past and future motion from a still image, combining that data to create a seamless animation.
“When we see a waterfall, we know how the water should behave. The same is true for fire or smoke. These types of motions obey the same set of physical laws, and there are usually cues in the image that tell us how things should be moving,” Hołyński said. “We’d love to extend our work to operate on a wider range of objects, like animating a person’s hair blowing in the wind. I’m hoping that eventually the pictures that we share with our friends and family won’t be static images. Instead, they’ll all be dynamic animations like the ones our method produces.”
To teach their neural network to estimate motion, the team trained the model on more than 1,000 videos of fluid motion such as waterfalls, rivers and oceans. Given only the first frame of the video, the system would predict what should happen in future frames, and compare its prediction with the original video. This comparison helped the model improve its predictions of whether and how each pixel in an image should move.
The researchers used the NVIDIA Pix2PixHD GAN model for motion estimation network training, as well as FlowNet2 and PWC-Net. NVIDIA GPUs were used for both training and inference of the model. The training data included 1196 unique videos, 1096 for training, 50 for validation and 50 for testing.
Read the University of Washington news release for more >>
The researchers’ paper is available here.