Stable Diffusion: weird for visual arts, a boon for image compression?

Alfonso Maruccia

Posts: 99   +50
Staff
In a nutshell: Stable Diffusion is a phenomenal example of how much a picture is worth more than a thousand words. In fact, by cutting the image-generation text prompt altogether, the visual AI could be used to get a highly compressed, high quality image file.

Stable Diffusion is a machine learning algorithm capable of generating weirdly complex and (somewhat) believable images just from interpreting natural language descriptions. The text-to-image AI model is incredibly popular among users despite the fact that online art communities have started to reject AI-based images.

Other than being a controversial example of machine-assisted visual expression, Stable Diffusion could have a future as a powerful image compression algorithm. Matthias Bühlmann, a self-described "software engineer, entrepreneur, inventor and philosopher" from Switzerland, recently explored the opportunity to employ the machine learning algorithm for a completely different kind of graphics data manipulation.

In its traditional model, Stable Diffusion 1.4 can generate artwork thanks to its acquired ability to make relevant statistic associations between images and related words. The algorithm has been trained by feeding millions of Internet images to the "AI monster," and it needs a 4GB database which contains compressed, smaller mathematical representations of the previously analyzed images that can be extracted as very small images when decoded.

In Bühlmann's experiment, the text prompt was bypassed altogether to put Stable Diffusion's image encoder process to work. Said process takes the small source images (512x512 pixels) and turns them into an even smaller (64×64) representation. The compressed images are then extracted to their original resolution, with pretty interesting results.

The developer highlighted how SD-compressed images had a "vastly superior image quality" at a smaller file size when compared to JPG or WebP formats. The Stable Diffusion images were smaller and exhibited more defined details, showing fewer compression artifacts than the ones generated by standard compression algorithms.

Could Stable Diffusion have a future as a higher quality algorithm for lossy compression of images on the Internet and elsewhere? The method used by Bühlmann (for which there's a code sample online) still has some limitations, as it doesn't work so well with text or faces and it can sometimes generate additional details that were not present in the source image. The need for a 4GB database and the time-consuming decoding process are a pretty substantial burden as well.

Permalink to story.

 

defaultluser

Posts: 512   +390
Well sure, this is the leap in analysis compute power that will teach an old dog new tricks

much like lame alt preset standard sudden was competitive with apple AAC this compute breakthrough shows there's still new tricks to be learned I mean the reason JPEG 2000 failed was because we didn't have a new cutting-edge image analysis to go with it!
 
Last edited:

nvidiagreed

Posts: 12   +4
I have been using stable diffusion for about 2 weeks now, having generated 2000+ images. I can say with absolute certainty in the current state its in it has far too many artifacts and errors to be of use for this.

Nvidias dlss on the other hand or gigapixel ai are much better candidates for ai upscaling
 

kinetix

Posts: 57   +46
Well, the Stable Diffusion neural network is a Convolutional AutoEncoder, in the form of a U-Net with skip connections and with some additional modifications (the DLSS cnn is also a UNet CAE with skip connections, but much much smaller)

and since always, one of the possible recognized applications of CAE has been data compression. the CAE reduce the scale of the input (like images) and keep the most relevant features, until reaching the bottleneck, which contains the "compressed" representation (lossy). then this data is expanded again in expansion layers (deconvolution), recomposing and rebuilding the features until the output (which does not have to be an image) is reached.

is a simple type of network that can be used for super resolution tasks (that's how I used it) but also for segmentation, scene depth estimation, etc.

the imaging software may have an internal CAE (both halves, the compressor and the decompressor), and the compressed images may be the data extracted from the network bottlenek. Thus, when opening one of these image files, it is passed only through the decompressing half of the CAE, and the visual image is obtained