Riffusion

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio.

[1] It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms.

[4] Riffusion is classified within a subset of AI text-to-music generators.

In December 2022, Mubert[6] similarly used Stable Diffusion to turn descriptive text into music loops.

In January 2023, Google published a paper on their own text-to-music generator called MusicLM.