Using AI to make videos POP

Published in

Nerd For Tech

5 min readMay 8, 2021

Frame interpolation is the process of increasing the framerate of a video after it’s been recorded by guessing what intermediary frames would have looked like. This, if done correctly, produces a smoother video with few to no artifacts.

How does it work?

While there are many different methods for frame interpolation I’ll be focusing on just three of them. I’ve chosen to look at frame averaging, motion estimation, and AI-powered interpolation.

Frame Averaging

Frame averaging is probably what you’d first think of when trying to design a frame interpolation algorithm. It works by taking two neighbouring frames and overlaying one on top of the other. While easy to implement this algorithm isn’t generally worth using since it produces jarring results.

When used on a video without much motion this algorithm won’t do much, only creating slightly blurry interpolated frames. Videos with more motion, on the other hand, will suffer from ghost frames, resulting in a blurry and jittery mess — not exactly ideal when the goal is to create smoother footage.

Example of Frame Averaging — Center Frame is Interpolated

Motion Estimation

Motion estimation is a much more sophisticated frame interpolation algorithm that works by:

Breaking up the frame into a collection of discrete blocks and
Estimating how those blocks move in-between frames.

Let’s look at those steps in a little more depth, shall we?

Visualization of Motion Estimation, Wikipedia

The first part of the algorithm is to break up the frame into a collection of smaller blocks. Using fewer blocks will speed up the algorithm but will also distort detailed movements. Therefore, implementations of this algorithm generally use dynamically sized blocks that change depending on the difference between the two frames.

Once a frame is divided up, a velocity vector is calculated for each block. This velocity represents how a given block moved between the two frames. In-between frames can then be constructed by distorting the image such that the cells are moved towards their next location.

This method will, in general, produce much more accurate results than frame averaging. On shots with smooth motion it will produce images that are nearly indistinguishable from the ground truth. However, on videos with more complex motion, this algorithm tends to struggle. First of all, since the in-between frames are constructed by distorting the previous frame, areas surrounding moving elements can be distorted. Secondly, accelerating objects will stutter because the velocity vectors assume linear motion. Finally, very fast-moving objects can be missed entirely and create ghost images.

AI-Powered Interpolation

AI-powered interpolation is much more powerful than the previous two methods because machine learning models are able to more accurately understand the video’s motion. This helps to mitigate many of the problems that show up in previous methods such as image ghosting, distortion, and stutter. Most importantly, AI models are able to distinguish between foreground and background elements. This allows them to logically reconstruct, something that the previous methods often struggle with.

Example of missing background information — highlighted in purple

Reconstructing background information is one of the most important parts of frame interpolation. Since videos almost always have shifts in perspective new information will need to be generated. Filling in these areas incorrectly will lead to obvious differences between the real and interpolated frames, ruining the illusion.

Luckily, when using AI interpolation, we don’t have to worry about this problem at all! The AI will naturally learn how to re-construct missing background information in a realistic manner. This is generally achieved by giving the AI not just the two frames immediately surrounding the interpolated image but up to ten neighbouring frames. From these images, the AI can predict what the background looks like and thus fill it in when it becomes visible.

Use Cases for Frame Interpolation

Aside from the obvious use case of just increasing frame rates for fun, why bother learning about frame interpolation? Well, here are a few practical uses.

Virtual Reality

Virtual reality is extremely computationally demanding. Modern headsets have over 7 million pixels in them, making them nearly as large as a 4k monitor. Furthermore, to maintain a fluid user experience VR headsets need to update their image up to 120 times per second. This means that your poor GPU is generating over 700 million pixels per second.

Given that there are so many frames being generated per second the difference between these frames is very small. This is exactly what frame interpolation is good at! Using AI-powered frame interpolation could easily halve the computation required to generate these frames.

Video Compression

Current video compression techniques work by using a mix of interframe and intraframe compression. While these algorithms are effective, there’s a limit to how much you can compress a single image before it becomes unrecognizable.

This is where NVIDIA’s recent paper on image synthesis for video conferencing comes in. This paper talks about a lot of things but I’m going to just focus on one part, video compression. By first sending a full, uncompressed image over the network NVIDIA was able to construct subsequent frames from relatively minuscule amounts of information. An implementation of this paper would allow people with slow or limited internet connections to participate in HD web conferences.

TL;DR

Frame interpolation is the process of creating in-between frames to increase fluidity
Frame interpolation can be accomplished by either hard-coded or AI-driven algorithms.
Frame interpolation has a number of uses including video compression and quality augmentation.

Thanks for reading my article! Feel free to check out my portfolio, message me on LinkedIn if you have anything to say, or follow me on Medium to get notified when I post another article!