Qualcomm ran a complete Stable Diffusion AI model on an Android phone
Optimizations led to a completely offline image generation experienceBy Alfonso Maruccia
Forward-looking: Stable Diffusion is a deep learning model capable of turning words into eerie, distinctly artificial images. The machine learning network usually runs in the cloud and it can also be installed on a beefy PC to work offline. With further optimizations, the model can be efficiently run on Android smartphones as well.
Qualcomm was able to adapt the image creation capabilities of Stable Diffusion to a single Android smartphone powered by a Snapdragon 8 Gen 2 SoC device. It is a remarkable result which, according to the San Diego-based company, is just the beginning for AI applications managed on edge computing devices. No internet connection is required, Qualcomm assures.
As explained on Qualcomm's corporate blog, Stable Diffusion is a large foundation model employing a neural network trained on a vast quantity of data at scale. The text-to-image generative AI contains one billion parameters, and it has mostly been "confined" in the cloud (or on a traditional x86 computer equipped with a recent GPU).
Qualcomm AI Research employed "full-stack AI optimizations" to deploy Stable Diffusion on an Android smartphone for the very first time, at least with the kind of performance described by the company. Full-stack AI means that Qualcomm had to tailor the application, the neural network model, the algorithms, the software and even the hardware, even though some compromises were clearly required to get the job done.
First and foremost, Qualcomm had to shrink the Single-precision floating-point data format (or FP32) used by Stable Diffusion to the lower-precision INT8 data type. By using its newly-created AI Model Efficiency Toolkit's (AIMET) post-training quantization, the company was able to greatly increase performance while also saving power and maintaining model accuracy at this lower precision with no need for costly re-training.
The result of this full-stack optimization was the ability to run Stable Diffusion on a phone, generating a 512 x 512 pixel image in under 15 seconds for 20 inference steps. This is the fastest inference on a smartphone and "comparable to cloud latency," Qualcomm stated, while user input for the textual prompt remains "completely unconstrained."
Running Stable Diffusion on a phone is just the beginning, Qualcomm said, as the ability to run large AI models on edge devices provides many benefits such as reliability, latency, privacy, efficiency, and cost. Furthermore, full-stack optimizations for AI-based hardware accelerators can easily be used for other platforms such as laptops, XR headsets and "virtually any other device powered by Qualcomm Technologies."