Researchers utilize GPUs for object detection in 4K/8K videos

midian182

Posts: 9,665   +121
Staff member
In brief: Thanks to machine learning, object detection has come a long way in recent years, but most models still perform best on low-resolution video images. Now, researchers at Carnegie Mellon University have developed a new system that uses GPUs to quickly and accurately detect objects in 4K and 8K video.

As explained to TechXplore by researcher Vít RůžIčka: "While plenty of data sources record in high resolution, current state-of-the-art object detection models, such as YOLO, Faster RCNN, SSD, etc., work with images that have a relatively low resolution of approximately 608 x 608 px.”

The majority of current models use these images for three reasons: they are sufficient for the task; processing low-resolution images is more time efficient; and many publicly available datasets used to train the models are made up of low-res images.

The problem with low res, of course, is that the videos don’t pick up a lot of detail. And with the number of 4K and even 8K cameras on the rise, a new model is needed to analyze them. That’s where the researchers’ ‘attention pipeline’ comes in.

The method, which is the work of RůžIčka and his colleague Franz Franchetti, divides the task of object detection into two stages, both of which involve subdividing the original image by overlaying it with a regular grid and then applying the model YOLO v2 for fast object detection.

"We create many small rectangular crops, which can be processed by YOLO v2 on several server workers, in a parallel manner," RůžIčka explained. "The first stage looks at the image downscaled into lower resolution and performs a fast object detection to get rough bounding boxes. The second stage uses these bounding boxes as an attention map to decide where we need to check the image under high-resolution. Therefore, when some areas of the image don't contain any object of interest, we can save on processing them under high resolution."

The researchers implemented their model in code, which distributes the work across GPUs. They managed to maintain high accuracy while reaching an average performance of three to six fps on 4K videos and two fps on 8K videos. Compared to the YOLO v2 approach of down-scaling images to low resolutions, the method improved the average precision score from 33.6 AP50 to 74.3 AP50.

"Our method reduced the time necessary to process high-resolution images by approximately 20 percent, compared to processing every part of the original image under high resolution," RůžIčka said. "The practical implication of this is that near real-time 4K video processing is feasible. Our method also requires a lower number of server workers to complete this task."

RůžIčka and Franchetti say they are looking at ways to improve their model further—overlaying the grid onto the images can sometimes result in objects being cut in half. You can learn more about the process here.

Permalink to story.

 
RůžIčka
- I'd never think that's a name. Some eastern-European hieroglyphs? There is a reason why Asians, for example, write their names with Latin letters these days, you know.
 
RůžIčka
- I'd never think that's a name. Some eastern-European hieroglyphs? There is a reason why Asians, for example, write their names with Latin letters these days, you know.

It's used by Slovaks & Czechs. Guy is Czech. Also Central European*, we don't like being called Eastern Europeans. Although by British logic, anything East of Germany is Eastern EU
 
When 'the rise of the machines' occurs, we are so screwed. A rogue terminator (its a guarantee defence departments will make these), will get a head shot, for example, on every person in the red box in the photo above with the efficiency of the M1Abrams taking on Iraqi armour (for the uninitiated, think 1 shot kills at 2 km range while the M1 is on the move). Night will not be your friend either, it'll see in the dark better than you do.
 
Back