1_son
π Science and technology
Science at the service of creativity

AI, a high-potential tool for music creation

Gaël Richard, Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
On September 3rd, 2024 |
5 min reading time
Gaël Richard
Gaël Richard
Professor at Télécom Paris (IP Paris) and Scientific Co-director of the Hi! PARIS interdisciplinary center for artificial intelligence
Key takeaways
  • AI applied to sound makes it possible to analyse, transform and synthesise sound signals.
  • The applications are numerous, ranging from predictive maintenance to virtual reality enhancement and personal assistance.
  • AI algorithms applied to sound require specific methods due to the temporal and voluminous nature of sound data.
  • The challenges associated with sound AI include its ecological impact, copyright issues, ethical concerns, and the need for an appropriate legal framework.
  • The HI-Audio project combines machine learning and human knowledge to create AI models that are more interpretable and controllable.

For more than 20 years, researchers have been using arti­fi­cial intel­li­gence (AI) on sound sig­nals. These sound sig­nals can be speech, music, or envi­ron­men­tal sounds. Recent advances in algo­rithms are open­ing the door to new fields of research and new applications.

How can artificial intelligence be used to process sound signals?

First­ly, AI can be used for sound analy­sis. In oth­er words, based on a record­ing, the machine can recog­nise the sounds (which instru­ment is play­ing, which machine or object is gen­er­at­ing which noise, etc.) and the record­ing con­di­tions (live, stu­dio, out­side, etc.). For exam­ple, Shaz­am is a fair­ly sim­ple but very well-known music recog­ni­tion AI.

AI can also be used to trans­form sound. For exam­ple, this involves sep­a­rat­ing the dif­fer­ent sources of a sound record­ing so that they can be remixed dif­fer­ent­ly (as with karaoke appli­ca­tions). It is also pos­si­ble to con­sid­er trans­fer­ring the musi­cal style of a giv­en sound record­ing or chang­ing the acoustic con­di­tions of the record­ing (for exam­ple by remov­ing the rever­ber­a­tion while keep­ing the con­tent intact). Final­ly, the third major area of sound pro­cess­ing using gen­er­a­tive AI is syn­the­sis. Giv­en a musi­cal extract or cer­tain instruc­tions, the machine can gen­er­ate music in the style of the extract. It can also be asked to gen­er­ate music in rela­tion to a text or image.

I’m cur­rent­ly work­ing on a major research project fund­ed by the Euro­pean Research Coun­cil (ERC) called HI-Audio, or “Hybrid and Inter­pretable Deep neur­al audio machines” The term “hybrid” implies that instead of learn­ing sole­ly from large quan­ti­ties of data, we are incor­po­rat­ing a pri­ori infor­ma­tion deduced from our knowl­edge into our learn­ing mod­els. We already have cer­tain knowl­edge about sound: the type of musi­cal instru­ments present, the lev­el of rever­ber­a­tion in a room, etc. The idea is to use this knowl­edge as a basis for rel­a­tive­ly sim­ple mod­els that describe these phe­nom­e­na. Then we insert them into neur­al net­works and more com­plex mod­els that allow us to learn and describe what we don’t know. The result is mod­els that com­bine inter­pretabil­i­ty and controllability.

What are the specific features of AI algorithms applied to sound?

A sound sig­nal is a tem­po­ral sig­nal (a sequence of data ordered in time) that can be more or less peri­od­ic. First of all, each sound sig­nal has its own spe­cif­ic char­ac­ter­is­tics. Recog­nis­ing the instru­ments and notes in a musi­cal record­ing requires advanced source sep­a­ra­tion tech­niques, mak­ing it pos­si­ble to dis­tin­guish and iso­late each sound ele­ment. Unlike speech, where a sin­gle instru­ment (the voice) con­veys a lin­guis­tic mes­sage, musi­cal analy­sis must man­age the simul­tane­ity and har­mo­ny of the instruments.

Anoth­er speci­fici­ty of music is the length of the record­ings. In prin­ci­ple, this type of AI is trained in much the same way as for images or text. But unlike an image, a sound sig­nal is a series of num­bers, pos­i­tive or neg­a­tive, that vary over time around a ref­er­ence val­ue. For one sec­ond of music, with a CD-qual­i­ty record­ing, there are 44,100 val­ues per sec­ond. Sim­i­lar­ly, if we have had one minute of record­ing, we have 2,646,000 val­ues (44,100 x 60 sec­onds). Data vol­umes are very high for a short peri­od of time. It is there­fore nec­es­sary to have spe­cif­ic AI meth­ods applied to sound, but also very pow­er­ful analy­sis resources to be able to process this vol­ume of data.

Which application sectors could benefit from these developments in sound processing?

Sound sig­nal pro­cess­ing, or more gen­er­al­ly AI applied to sound, is already used in a vari­ety of fields. First of all, there are indus­tri­al appli­ca­tions. Speech is very sen­si­tive to rever­ber­a­tion, which can quick­ly affect intel­li­gi­bil­i­ty. It is nec­es­sary to “clean” the sound sig­nal of envi­ron­men­tal noise, par­tic­u­lar­ly for tele­phone com­mu­ni­ca­tions. Anoth­er area not to be over­looked is the use­ful­ness of syn­the­sised sound envi­ron­ments in the audio­vi­su­al indus­try. Recre­at­ing ambi­ent sound allows you to sug­gest what is off-screen. Let’s imag­ine a film scene on a café ter­race. We prob­a­bly won’t know where the café is locat­ed: in the town cen­tre, in a res­i­den­tial area, near a park, etc. Depend­ing on the direc­tion, sound can help immerse the view­er in a rich­er atmos­phere. The same applies to video games and vir­tu­al real­i­ty. Sound is one of the five sens­es, so we are very sen­si­tive to it. Adding sound enhance­ment increas­es real­ism and immer­sion in a vir­tu­al environment.

With the devel­op­ment of AI applied to sound, new fields of appli­ca­tion can be envis­aged. I’m think­ing par­tic­u­lar­ly of pre­dic­tive main­te­nance, mean­ing that we could use sound to detect when an object is start­ing to mal­func­tion. Under­stand­ing the sound envi­ron­ment could also be use­ful in the devel­op­ment of self-dri­ving cars. In addi­tion to the infor­ma­tion cap­tured by the cam­eras, it will be able to steer itself accord­ing to the sur­round­ing noise: bicy­cle bells, pedes­tri­ans’ reactions.

Let’s not for­get that pro­cess­ing sound sig­nals can become a tool for help­ing peo­ple. In the future, we can imag­ine an AI trans­lat­ing the sound envi­ron­ment into anoth­er modal­i­ty, enabling deaf peo­ple to “hear” the world around them. On the oth­er hand, per­haps sound analy­sis will help to pro­tect peo­ple at home by detect­ing and char­ac­ter­is­ing nor­mal, abnor­mal, and alarm­ing nois­es in the home. And that’s just a non-exhaus­tive list of pos­si­ble applications!

What are the main challenges and issues linked to the development and use of AI in general and more specifically in the field of sound?

One of the main dilem­mas is the eco­log­i­cal impact of such sys­tems. The per­for­mance of gen­er­a­tive AI in gen­er­al is cor­re­lat­ed with the amount of data ingest­ed and com­put­ing pow­er. Although we have so-called “fru­gal” approach­es, the envi­ron­men­tal and eco­nom­ic reper­cus­sions of these tools are non-neg­li­gi­ble. This is where my research project comes in, as it explores an alter­na­tive, more fru­gal approach to hybrid AI. 

Anoth­er con­cern for sound pro­cess­ing is access to music data­bas­es because of copy­right issues. Over­all, reg­u­la­tions can be an obsta­cle to the devel­op­ment of AI in France. In the Unit­ed States, the notion of fair use allows a degree of flex­i­bil­i­ty in the use of copy­right­ed works. In Europe, we are jug­gling between sev­er­al meth­ods. All the same, there are a few pub­lic data­bas­es that con­tain roy­al­ty-free com­po­si­tions writ­ten specif­i­cal­ly for research pur­pos­es. Some­times we work with com­pa­nies like Deez­er, which offer restrict­ed access to their cat­a­logues for spe­cif­ic projects.

AI applied to sound also pos­es cer­tain spe­cif­ic eth­i­cal prob­lems. In par­tic­u­lar, there is the ques­tion of the music gen­er­at­ed by the machine and the poten­tial for pla­gia­rism, since the machine may have been trained using well-known and pro­tect­ed music. Who owns the copy­right to the music gen­er­at­ed by the machine? What is the price of this auto­mat­i­cal­ly gen­er­at­ed music? How trans­par­ent is the music cre­ation process? Final­ly, there is the ques­tion of the con­trol­la­bil­i­ty of AI or, more pre­cise­ly, its explic­a­bil­i­ty. We need to be able to explain the deci­sions tak­en by the machine. Let’s go back to our exam­ple of the autonomous car: we need to be able to deter­mine why it choos­es to turn at a giv­en moment. “It was the most like­ly action,” is not a suf­fi­cient answer, par­tic­u­lar­ly in the event of an acci­dent. In my opin­ion, it is vital to inte­grate human knowl­edge into these AI sys­tems and to ensure trans­paren­cy in its use. 

More gen­er­al­ly, we need to build a legal frame­work for these con­stant­ly evolv­ing tech­nolo­gies. But France and Europe some­times tend to over­reg­u­late, ham­per­ing inno­va­tion and our inter­na­tion­al com­pet­i­tive­ness. We need to iden­ti­fy and pro­tect our­selves against the risks of abuse and the eth­i­cal risks of AI, which are real, but we also need to avoid overregulation. 

Do you think AI will have an impact on musicians and the sound industry?

AI will have an impact every­where. In all pro­fes­sions, all com­pa­nies and all envi­ron­ments, includ­ing jobs in the music sec­tor. Yes, it can raise con­cerns and ques­tions, like musi­cians and film sound engi­neers who fear they will be replaced. Some jobs may dis­ap­pear, but oth­ers will be created.

In my view, AI is more a tool than a threat. They will open up a new range of pos­si­bil­i­ties. By mak­ing it pos­si­ble to play togeth­er remote­ly, AI will be able to bring togeth­er com­mu­ni­ties of musi­cians across the plan­et. They can also help to democ­ra­tise music learn­ing, by cre­at­ing fun, per­son­alised remote “train­ing cours­es”. It is also a fair­ly sophis­ti­cat­ed com­po­si­tion tool that can stim­u­late artists’ creativity.

AI in itself is not cre­ative. It repro­duces and reshapes but cre­ates noth­ing. Sim­i­lar­ly, in my opin­ion, AI does not make art. It’s almost con­cep­tu­al­ly impos­si­ble for a machine to make art. Art, even if it’s not clear­ly defined, is per­son­i­fied; it’s a form of human com­mu­ni­ca­tion. Today, AI, par­tic­u­lar­ly AI applied to sound pro­cess­ing, is not capa­ble of that.

Interview by Loraine Odot

Our world explained with science. Every week, in your inbox.

Get the newsletter