Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio.
[1] It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms.
[4] Riffusion is classified within a subset of AI text-to-music generators.
In December 2022, Mubert[6] similarly used Stable Diffusion to turn descriptive text into music loops.
In January 2023, Google published a paper on their own text-to-music generator called MusicLM.