Papers
arxiv:2503.02823

A Multimodal Symphony: Integrating Taste and Sound through Generative AI

Published on Mar 4
· Submitted by matteospanio on Mar 5
Authors:
,
,

Abstract

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' (n=111) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

Community

Paper author Paper submitter

Generative AI has been making waves in creative domains, from text and image generation to music composition. However, one sensory modality has remained largely unexplored in the realm of AI-driven creativity: taste. In A Multimodal Symphony: Integrating Taste and Sound through Generative AI, we investigate how AI can bridge the gap between taste and sound, generating music that embodies the essence of different flavors.

The Science Behind Taste-Sound Associations

Neuroscientific and psychological research has shown that certain auditory characteristics influence how we perceive taste. High-pitched sounds, for instance, are often linked to sweetness, while low-pitched, resonant tones can evoke bitterness. These crossmodal correspondences form the foundation for our study, where we fine-tuned a generative music model to produce compositions aligned with specific taste descriptions.

Fine-Tuning MusicGEN for Taste-Based Composition

For our experiment, we fine-tuned MusicGEN, an open-source music generation model, on a dataset enriched with taste and emotional descriptors. Using the Taste & Affect Music Database, we trained the model to associate musical elements—such as tempo, timbre, and harmony—with specific taste profiles (sweet, sour, bitter, and salty). The goal was to determine whether this fine-tuned model could generate music that listeners perceive as more representative of the given taste prompts compared to its non-fine-tuned counterpart.

Evaluating the Generated Music

To validate our approach, we conducted an online survey with 111 participants, who listened to audio clips and rated their coherence with corresponding taste descriptions. The results were promising: our fine-tuned model generated music that was significantly more aligned with the intended taste attributes, particularly for sweet, bitter, and sour prompts. However, representations of saltiness proved more challenging, indicating a need for further refinement in dataset composition and model training.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.02823 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.02823 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.