Detecting Blurry Frames in Video
I’ve been trying to use structure from motion (SfM) to create a pointcloud map of an unstructured and visually obstructed (=forest) area. To gather the data, I’ve been recording video while walking around in the area. However, this rather brute force approach has led to two problems.
- Some of the frames are of poor quality due to motion blur.
- The number of frames is too high to solve the reconstruction problem within a sensible timeframe.
To solve this issue, I decided to implement a blur-aware downsampling process, effectively killing two birds with one stone. Code is available in my github. I’m aware there are far more elaborate methods for preprocessing large-scale image sets for SfM (e.g. vocabulary trees in COLMAP), yet I figured the presented approach is a convenient and simplistic fix to my use case.
Blur detection with variance of the Laplacian
Blur detection can be achived by computing the Laplacian of the given image, and determining the variance of the response throughout the image. I took inspiration from here, with the technical details broken down here. The basic idea is that the Laplacian kernel responds strongly to high-frequency detail (=rapid intensity changes) in the image. Presence of different types of high-frequency detail in the image indicates that the image is sharp. Therefore, the variance of the Laplacian gives us a single scalar value to indicate whether the image is sharp (high variance) or blurry (low variance). OpenCV provides a convenient oneliner for computing the Laplacian of an image.
Key drawback of this described blur detection method is that the absolute magnitude of the variance of the Laplacian is obviously scene dependent, as the visibile contents affect the response. Therefore, general thresholds for “sharp” and “blurry” classifications are difficult to derive. However, since I use the variance of the Laplacian for downsampling, I have the natural benefit of not needing to use threshold values. For a chosen sub-sequence of images, I simply choose the image with the highest sharpness value (variance of the Laplacian). Provided a video of reasonably high frame rate, the images should mostly depict the same scene. The image with the highest sharpness value should therefore present the scene in the highest quality.
Experiments
I used a single, approximately 7 min (24 fps) video with a resolution of 3840x2160 to test out the method. The distribution of the computed sharpness values for each frame are presented below.
The below image comparison highlights the efffectiveness of the method in detecting blurry frames. Both images have been tagged with their respective sharpness values. The images are of the same scene under the same illumination, and the sharpness value clearly indicates which frame is of higher quality.
To downsample the video data, I split the video into chunks of 10 frames, and chose the frame with the highest sharpness value. A visualisation of this process and the result is shown below. Each frame in the 10 frame sequence is tagged with its sharpness value. However, due to reducing the resolution of the images for the webpage, the blur in the latter frames is not clearly visible…
To analyse the overall impact of the presented downsampling method, below is a comparison of the blur-aware downsampling and “naive” downsampling (simply taking every 10th frame). The resulting sharpness value distributions are presented for both downsampling techniques. The blur-aware downsampling has shifted the distribution notably to the right, with the mode of the distribution at around 200, whereas the mode of the naive downsampling distribution is at under 100.
Assuming that the sharpness value (variance of the Laplacian) correlates with image quality, we should now have a downsampled video with much higher quality frames!