Nvidia creates super slow-motion video that is even smoother than a 300K fps camera

Cal Jeffrey

Posts: 4,181   +1,427
Staff member
Forward-looking: Nvidia has developed a technique that uses neural networks to create smooth slow-motion video from standard footage. Variable-length multi-frame interpolation uses machine learning to "hallucinate" transitions between frames of film then inserts these artificially created images between them to seamlessly slow down the final footage.

I’m not sure why, but people just love to watch slow-motion videos. In fact, it is so popular that Gavin Free and Dan Gruchy have a YouTube channel wholly devoted to the subject called The Slow Mo Guys that has almost 1.5 billion views and over 11 million subscribers. Free saw a niche to be filled since creating slow-motion video is not practical for most people. Aside from the equipment being extremely expensive, with footage shot at over 300,000 fps storage quickly becomes a problem.

Filters exist that convert regular video to slow motion, but the result is somewhat choppy since it just intersperses duplicate frames to elongate the footage. However, Nvidia researchers think they have developed a way to create slow-motion video that is even smoother than those taken with high-speed cameras like the ones that Free and Gruchy use on their channel.

According to VentureBeat, “Scientists from Nvidia, the University of Massachusetts Amherst, and the University of California, Merced engineered an unsupervised, end-to-end neural network that can generate an arbitrary number of intermediate frames to create smooth slow-motion footage.”

The technique has been dubbed “variable-length multi-frame interpolation,” and it uses machine learning to fill in the gaps between frames of a video to create smooth-running, slow-motion versions.

“You can slow it down by a factor of eight or 15 — there’s no upper limit,” said Nvidia’s Senior Director of Visual Computing and Machine Learning Research Jan Kautz.

The technique uses two convolutional neural networks (CNN) in tandem. The first makes both forward and backward estimations of the optical flow in the timeline between frames. It then generates what is called a “flow field,” which is a 2D vector of predicted motion to be inserted between the frames.

“A second CNN then interpolates the optical flow, refining the approximated flow field and predicting visibility maps in order to exclude pixels occluded by objects in the frame and subsequently reduce artifacts in and around objects in motion. Finally, the visibility map is applied to the two input images, and the intermediate optical flow field is used to warp (distort) them in such a way that one frame transitions smoothly to the next.”

The results are remarkable as you can see in the video above. Even video taken at 300K fps by the Slow Mo Guys was slowed down even further and looks even smoother than the original.

The technique uses Nvidia Tesla V100 GPUs and a cuDNN-accelerated PyTorch deep learning framework. As such, don’t expect to see a commercial version being released anytime soon.

According to Kautz, the system needs a lot of optimization before they can get it running in real time. He also says that even when it does get commercialized, most of the processing will have to be done in the cloud due to hardware limitations in the devices where the filter would likely be used.

If you are into the technical details, the team has a paper outlining it at the Cornell University Library.

Permalink to story.

 
Cool, but, it would be interesting to see how it compares to real slo-mo at the same frame rates. If the action is not too chaotic, then it might be relatively easy to predict the motion. I get that they used the process on real slo-mo footage as a basis, but I would think that in some cases, real footage would be more desirable than predicted footage.
 
It's pretty cool but there is that one camera advertised on Youtube that goes for about $3,000 a copy and boasts speeds that go well above this. Wish I could recall the name so I could post it here.
 
For small university Lab, small independant r&d company, or client, you usually chose either mp or fps. With this, they could save a little bucks on the speed of the cam thus having higher resolution, then sending the footage to Nvidia for a super slow-mo result.
 
The Slo Mo Guys are great. Slow motion photography opens up the world in much the same way that microscopes and telescopes and IR and UV photography do.

Humans are unable to discern as separate events that happen less than 1/20 of a second apart. Stretching out an action reveals what could never be seen otherwise. Because the Slo Mo Guys are essentially acting as scientists, presenting us with what authentically happens, they may not want to use this new technique, which, although having the potential to produce beautiful videos, might not be up to the scientific method standard of verifiability.
 
Provably false; you're arguing that humans can only see at about 20Hz.
No, I'm not arguing that humans can only see at 20Hz. I'm arguing just what I said. And it's provably true. If you watch a movie at 24fps you will perceive it as continuous motion. The flicker is there, but because it happens in less than 1/20 of a second it's integrated with the image.
 
The Slo Mo Guys are great. Slow motion photography opens up the world in much the same way that microscopes and telescopes and IR and UV photography do.

Humans are unable to discern as separate events that happen less than 1/20 of a second apart. Stretching out an action reveals what could never be seen otherwise. Because the Slo Mo Guys are essentially acting as scientists, presenting us with what authentically happens, they may not want to use this new technique, which, although having the potential to produce beautiful videos, might not be up to the scientific method standard of verifiability.

Provably false; you're arguing that humans can only see at about 20Hz.
No, I'm not arguing that humans can only see at 20Hz. I'm arguing just what I said. And it's provably true. If you watch a movie at 24fps you will perceive it as continuous motion. The flicker is there, but because it happens in less than 1/20 of a second it's integrated with the image.

Thats only partially true. You perceive it as smooth motion *only* when onscreen movement and camera pans are carefully crafted. There is a reason filmmakers have a specific maximum panning speed to maintain a certain look and to avoid bothering the viewers.

Higher framerates have plenty of uses, especially outside of film. In gaming the difference between 24fps and 60fps is the difference between being able to play and becoming nauseous for many people. The difference between 60fps and 144fps can be the difference between a win and a loss. If a character runs across your field of view at 30fps you only get 1/4th as many frames as you do at 120fps, making aiming and targeting significantly harder. You also tend to pan your view vastly quicker then a filmmaker ever would. The same pan speed that produces a blobby blur at 24-30fps would be tack sharp at 144fps, at 30fps you likely wouldnt see anything while at 144fps you could easily make out enemies or other important objects even at the high horizontal pan speeds.

Im not a big fan of the statement "humans cant see faster then XX speed" because of what I was talking about above. It might be a technically true statement when carefully limited but it can be misleading because of that (and can often cause people to believe incorrect information, like that a high refresh rate monitor is useless, or that pushing a game past 30fps is pointless, etc).
 
Its pretty convincing but I see it mainly being used in entertainment focused areas and not informational or scientific. Notice how it made her hair look unnatural when she spun? And it wont be useful for the extremely fast events that require a true high speed camera (like capturing a bullet in flight, or the wavefront of an explosion or shattering glass, etc) since the actually event happens so quickly you might only get a single frame (or none at all) even at 1000-5000fps.

But where it will be extremely useful (and a game changer) is in sports video. It did a great job on the car and the hockey players and both of those would be difficult and expensive to shoot continuously at high speed. In addition this method has some advantages over "real" high speed photography, mainly in lighting flexibility. Shooting at 240fps requires *4 times* as much light as shooting at 30fps, something hard to do in some lighting conditions. Since this is done after the fact, say for a replay or something, it could be shot normally and then sped up in post.
 
Thats only partially true. You perceive it as smooth motion *only* when onscreen movement and camera pans are carefully crafted. There is a reason filmmakers have a specific maximum panning speed to maintain a certain look and to avoid bothering the viewers.

Higher framerates have plenty of uses, especially outside of film. In gaming the difference between 24fps and 60fps is the difference between being able to play and becoming nauseous for many people. The difference between 60fps and 144fps can be the difference between a win and a loss. If a character runs across your field of view at 30fps you only get 1/4th as many frames as you do at 120fps, making aiming and targeting significantly harder. You also tend to pan your view vastly quicker then a filmmaker ever would. The same pan speed that produces a blobby blur at 24-30fps would be tack sharp at 144fps, at 30fps you likely wouldnt see anything while at 144fps you could easily make out enemies or other important objects even at the high horizontal pan speeds.

Im not a big fan of the statement "humans cant see faster then XX speed" because of what I was talking about above. It might be a technically true statement when carefully limited but it can be misleading because of that (and can often cause people to believe incorrect information, like that a high refresh rate monitor is useless, or that pushing a game past 30fps is pointless, etc).
I've noticed that 60fps YouTube videos are remarkably more lifelike.

It's not like the viewer doesn't see the time gaps (and thus action gaps) in video, but if the gaps are less than 50 milliseconds then the frames run together in the "dynamic memory buffer" that we employ in seeing, along with the gaps. If, on the other hand, the frame rate is 15fps, then the gaps are 67 milliseconds and each frame can be perceived as a separate event, and it is overtly herky-jerky.

Remember that the "real world" has no frame rates -- it's all continuous. Yet, if you run your finger quickly across your field of vision you will see a blur because your visual apparatus is always clumping the last 50 ms together.
 
Thats only partially true. You perceive it as smooth motion *only* when onscreen movement and camera pans are carefully crafted. There is a reason filmmakers have a specific maximum panning speed to maintain a certain look and to avoid bothering the viewers.

Higher framerates have plenty of uses, especially outside of film. In gaming the difference between 24fps and 60fps is the difference between being able to play and becoming nauseous for many people. The difference between 60fps and 144fps can be the difference between a win and a loss. If a character runs across your field of view at 30fps you only get 1/4th as many frames as you do at 120fps, making aiming and targeting significantly harder. You also tend to pan your view vastly quicker then a filmmaker ever would. The same pan speed that produces a blobby blur at 24-30fps would be tack sharp at 144fps, at 30fps you likely wouldnt see anything while at 144fps you could easily make out enemies or other important objects even at the high horizontal pan speeds.

Im not a big fan of the statement "humans cant see faster then XX speed" because of what I was talking about above. It might be a technically true statement when carefully limited but it can be misleading because of that (and can often cause people to believe incorrect information, like that a high refresh rate monitor is useless, or that pushing a game past 30fps is pointless, etc).

Here's a simple test I did a few years ago on a 240Hz monitor. I created a simple program that painted the screen red at 240 FPS (and yes, the display/refresh were synced). Every 1 out of 240 frames was randomly replaced with an all black frame.

I was able to perceive every single instance of the black frame being inserted into the all red sample, which indicates the human eye is capable of perceiving images at least up to 240 Hz.
 
Isn't CNN fake news? and your going to use TWO of them? nothing but trouble from Clinton News Networks. maybe try the TNT network - Trump Never Trump
 
Here's a simple test I did a few years ago on a 240Hz monitor. I created a simple program that painted the screen red at 240 FPS (and yes, the display/refresh were synced). Every 1 out of 240 frames was randomly replaced with an all black frame.

I was able to perceive every single instance of the black frame being inserted into the all red sample, which indicates the human eye is capable of perceiving images at least up to 240 Hz.
Neat experiment.

The thing is, human vision has no "frame rate". It's not like video recording (and, hence, playback) where you "capture" successive scenes and there's always necessarily a temporal gap between each one, between each frame. Human sight is continuous -- we don't miss a thing. As long as there's a photon's worth of oscillation we perceive it.

You were able to perceive the black frame because once each second the brightness of the red dimmed momentarily by about 1/12. If you were to repeat the experiment and this time always put a red frame that was twice as bright within at most 12 frames from the black, you would probably have a much harder time detecting any difference. If you were to repeat your original experiment at, say, 4000 Hz, the brightness of the red would only dim by 1/200, and that might not be detectable.
 
Here's a simple test I did a few years ago on a 240Hz monitor. I created a simple program that painted the screen red at 240 FPS (and yes, the display/refresh were synced). Every 1 out of 240 frames was randomly replaced with an all black frame.

I was able to perceive every single instance of the black frame being inserted into the all red sample, which indicates the human eye is capable of perceiving images at least up to 240 Hz.
Neat experiment.

The thing is, human vision has no "frame rate". It's not like video recording (and, hence, playback) where you "capture" successive scenes and there's always necessarily a temporal gap between each one, between each frame. Human sight is continuous -- we don't miss a thing. As long as there's a photon's worth of oscillation we perceive it.

You were able to perceive the black frame because once each second the brightness of the red dimmed momentarily by about 1/12. If you were to repeat the experiment and this time always put a red frame that was twice as bright within at most 12 frames from the black, you would probably have a much harder time detecting any difference. If you were to repeat your original experiment at, say, 4000 Hz, the brightness of the red would only dim by 1/200, and that might not be detectable.

Good stuff there.

The way I have always thought of it (and this of course is putting it in simple terms) is that humans and animals hear and see in analog. That is to say as Reachable puts it, vision and hearing is "continuous." We don't see in frame rates like a camera. Vision is the complete picture. Saying that we only perceive at an arbitrary frequency like 20Hz or a million Hz is meaningless because my eye didn't miss a thing. No matter how fast the camera, there will be bits that are still missing from the total picture. Not so from an eye.

The question becomes: What can our brains process? General consensus might be 20Hz but as others have pointed out, a lot of it depends on the context and how something is being viewed. Just because a race car zips by at 200mph so that all I see is a blur does not mean that my mind did not pick up details like the number, the sponsors, the driver's face. I might not be able to consciously recall these things because all my mind could do was process a blur. However, I firmly believe that the subconscious works much faster than that and picks up more than just the blur our eye sees. However, I have to temper that with admitting I'm not an expert in the field so I could be off my @$$.

Fascinating discussion though.
 
Isn't CNN fake news? and your going to use TWO of them? nothing but trouble from Clinton News Networks. maybe try the TNT network - Trump Never Trump
That's really, really of topic, even by my rather liberal, (some would say, "non existent"), standards..

You could have probably left the wordplay at , "isn't CNN the news network", tacked on a couple of these....:confused::confused:, and left it at that..

Other types of topics lend themselves very much more readily to work in your hatred of our president, but in this post it is simply a big enough stretch to be well, bizarre.
 
Last edited:
Spin a spoke wheel and when the spokes appear to stop spinning, you have found the frequency of your vision.
A massive over simplification, but yeah, that would work.

With the wheel spinning at a measured and constant speed, comparative measurements could be made between individuals to determine whose eyeball nerve transmission speed was faster.

Unfortunately, PC gamers would have an agenda though, to put off calling the onset of blur, as an affirmation that faster monitor refresh rates are desperately needed.
 
Last edited:
Back