Riffusion, AI-based music generation where Beethoven meets Radiohead

Alfonso Maruccia

Posts: 1,016   +301
Staff
Forward-looking: First they came for our art, then they came for our text and garbled essays. Now they are coming for music, with a "new" machine learning algorithm that's adapting image generation to create, interpolate and loop new music clips and genres.

Seth Forsgren and Hayk Martiros adapted the Stable Diffusion (SD) algorithm to music, creating a new kind of weird "music machine" as a result. Riffusion works on the same principle as SD, turning a text prompt into new, AI-generated content. The main difference is that the algorithm has been specifically trained with sonograms, which can depict music and audio in visual form.

As explained on the Riffusion website, a sonogram (or a spectrogram for audio frequencies) is a visual way to represent the frequency content of a sound clip. The X-axis represents time, while the Y-axis represents frequency. The color of each pixel gives the amplitude of the audio at the frequency and time given by its row and column.

Riffusion adapts v1.5 of the Stable Diffusion visual algorithm "with no modifications," just some fine tuning to better process images of sonograms/audio spectrograms paired with text. Audio processing happens downstream of the model, while the algorithm can also generate infinite variations of a prompt by varying the seed.

After generating a new sonogram, Riffusion turns the image into sound with Torchaudio. The AI has been trained with spectrograms depicting sounds, songs or genres, so it can generate new sound clips based on all kinds of textual prompts. Something like "Beethoven meets Radiohead," for instance, which is a nice example of how otherworldly or uncanny machine learning algorithms can behave.

After designing the theory, Forsgren and Martiros put it all together into an interactive web app where users can experiment with the AI. Riffusion takes text prompts and "infinitely generates interpolated content in real time, while visualizing the spectrogram timeline in 3D." The audio smoothly transitions from one clip to another; if there is no new prompt, the app interpolates between different seeds of the same prompt.

Riffusion is built upon many open source projects, namely Next.js, React, Typescript, three.js, Tailwind, and Vercel. The app's code has its own Github repository as well.

Far from being the first audio-generating AI, Riffusion is yet another offspring of the ML renaissance which already led to the development of Dance Diffusion, OpenAI's Jukebox, Soundraw and others. It won't be the last, either.

Permalink to story.

 
This whole AI creation trend is pretty exciting, even when speaking from an artist's perspective. It's a paradigm shift that makes us question our understandings and practices of art - which is exactly what "movements" within art history are.

Digital photography is a good precedent for flipping the tables on what we all assumed a medium meant. It didn't kill it as an art - but rather led to its explosion into different forms of expression.

One could argue that a lot of the middle-of-the-road works being created commercially are just iterative forms (think pop idol stuff) that was pretty much mathematical. During the '80s and '90s there used to be a joke going around that Stock Aitken Waterman weren't real people, but was a mainframe inside a music label's office.
 
Last edited by a moderator:
AI and digital art are a completely different animal. Music is something that AI will have a very difficult time with. There are certain things that only a human being can create. AI replicates and deviates from a set of programs. How many people are on earth? Each individual interprets things differently.

How could AI ever produce a Mozart piece without there ever being a Mozart to begin with?
 
AI and digital art are a completely different animal. Music is something that AI will have a very difficult time with. There are certain things that only a human being can create. AI replicates and deviates from a set of programs. How many people are on earth? Each individual interprets things differently.

How could AI ever produce a Mozart piece without there ever being a Mozart to begin with?


I think AI will eventually be much better in music composing than humans. In not so far future we'll be able to order AI to generate new music just for us. It will respect our musical taste and generate music in the style we like, but something completely new. So that we (their glorious masters) wouldn't be bored by listening the same song over and over again on auto-repeat, like we do today.

And this future will come pretty fast. Of course, assuming that plans of world global rulers fail and we're not returned back to Amish level by 2030. In which case the only sound we'll hear will be sounds of goats, sheep and shepherd 's flute.
 
I think posting an article about music generation but not including any of the music generated is not ideal. You even reference "Beethoven meets Radiohead" and yet there is no audio clip of what this sounds like or an obvious link to one?
 
I think posting an article about music generation but not including any of the music generated is not ideal. You even reference "Beethoven meets Radiohead" and yet there is no audio clip of what this sounds like or an obvious link to one?
As explained on the Riffusion website, a sonogram (or a spectrogram for audio frequencies) is a visual way to represent the frequency content of a sound clip.
Click the link in that sentence.
 
Back