Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Making music using new sounds generated with machine learning (blog.google)
91 points by gk1 on March 15, 2018 | hide | past | favorite | 47 comments




Weird, usually HN will show an error or make a redirect when submitting something that's been posted, but it didn't happen this time.


Same article but on a different URLs, both Google sites.


Aargh come on Google, how about you do some research first with some long-standing audio research institutions. Cross-synthesis isn't going to sound particularly new because it's been around for decades.

Case in point: AudioSculpt manual, IRCAM 1996 (p62) (Second edition!)

http://homepages.gold.ac.uk/ems/pdf/AudioSculpt.pdf

Machine learning is gonna give us some awesome new sounds but you've got to do your research first on what's been done.


Yep. Simply put, what makes traditional instruments popular is the richness of their sounds, and how they enable the musician's expressiveness. Great traditional musicians don't switch between instruments, they mine the sonic possibilities of the one they're holding. That personal thing touches the audience, and reputation is built on it.

Of all the computer synths I tried out back in the day, Absynth was a standout. BUT it's hard to 'get physical' with a computer keyboard ... let alone a finger on a tablet. When the technology gets in the way, musicians run away.


Seems like google goes way beyond what the AudioSculpt was able to do [0] which seems largely like using the exact waveform of one input to modulate the other in a number of ways. Feels a world apart from what Google did in Magenta.

[0] https://youtu.be/w24YKVsS-fA?t=98


It's super cool, super smart, super fresh, but still sounds like those synthesizer in 90's. I was really expecting something new when I opened the link...


Yeah, it seems like a lot of work and technical know-how to make something that doesn't really sound new.

Seems more like a hipster-ish take on making music. "I only make music on instruments that I can build myself."


Hey now, us synth-diy guys have been around a lot longer than hipsters! :P

Seriously though, this announcement from Google does reek of "invented in ignorance"-style NIH'erism .. but then again, it may be an effective marketing ploy, given that there are so, so many kids getting into synthesisers, who don't have a foggy clue what decades of technology has produced, in terms of the state of the art.

Its a common thing in synthesisers though - someone comes up with some 'amazing new technique nobody has ever thought of before', and ends up .. producing sounds just like the 303's and DX's before them ..


well you have pure data, max/msp, reaktor, etc for that. These experiments with ML can approximate the soundwaves generated by particular instruments where the soundwaves produced are very complex, but other than that there are tons of software that can help you design any sounds you like. Machine learning is like the micro thing in the 70's-80's companies are trying to put it everywhere they can despite there are no benefits to the existing convenient products/solutions or not incorporating ml at all but still labeling the product. i.e. quoted from wiki: Microwaves are a form of electromagnetic radiation with wavelengths ranging from one meter to one millimeter. I see nowhere the 10^-6 meter waves being used in our commonly used kitchen appliance


There are any number of ways to approximate the soundwaves generated by particular instruments that work better than this.

There are also any number of ways to make cool new sounds, even using old tech. (My current favourite is Aparillo by SugarBytes, which uses FM but sounds absolutely amazing.)

This Google project seems to be a technology demonstrator made by people who seem curiously unaware of the domain they're trying to work in.

There is a lot of competition in this space and this project is doing nothing remotely new, including the ML element.

All of which might be forgivable if the sound was unbelievably awesome... but it isn't.


> Seems more like a hipster-ish take on making music. "I only make music on instruments that I can build myself."

I wish people would stop using this tired and bland description of people's motives to enjoy fresh, new, creative and diy. It seems pretty intellectually shallow to me.


> It seems pretty intellectually shallow to me.

So does spending a crazy amount of time and tech to implement something that could be done in other existing synthesizers with a fraction of the time/effort, just so that you can say "I made this with ML."


I disagree, I think it's a perfectly interesting project and worthy of experimentation regardless of the outcome.


If I could produce the same sounds with cheaper equipment in a fraction of the time, what makes this experiment worthy?

Spoiler alert, it's the buzz around ML.


Okay, fine. You think ML is pointless- I still posit that calling something "hipster" is dated and shallow and says more about the person saying it than the thing they are describing.


Yeah, I'm also disappointed. 'Fnure' sounds like 8-bit sound effect, but even input samples of flute and snare sound bad. Garbage in, garbage out.

The device is a nice toy, but even they say it's unpredictable. Usually you need about 4 samples per octave and 4 samples for different velocities (note loudness) for professional use. Would NSynth generate all those samples in uniform way so they can work together as a single instrument?


Yeah, the sample rate is really low (I assume to make it more performant) so they all sound bitcrushed.


Agreed. I am not impressed by the sound samples. I was expecting an AI version of Audio Modeling's SWAM engine.


This is FFT-Based resynthesis, and a not very good-sounding implementation at that. It’s understandable this is the technique that would be chosen as that the format of the data used lends itself to averaging/blending. However, after the bombastic headlines, I am truly disappoint.


Why do you say it's FFT based? The paper https://arxiv.org/pdf/1704.01279.pdf appears to be generating temporal samples, the FFT based one is their baseline comparison.


Did we even hear it though? If you're shy in a demo, maybe the product's not all that good. Just get a real musician playing it to confirm its merit.


I have it on very good authority (a fine music engineer friend) that this ten year old project still remains the standard for an AI music generator. And look at the people who used it https://en.wikipedia.org/wiki/Hartmann_Neuron


I once had a long discussion with Axel about the Neuron, which went like this, around and around in circles:

Me: "So .. how does it work?"

Axel: "It doesn't matter how it works, musicians don't care, they just want to hear something when they do something.."

Me: "Sure, okay .. but how does it work if I am a musician who does want to know.."

Axel: "Real musicians don't want to know."

Me: ...

I'm honestly not convinced he knew how it works, either! :P


Somewhat related (released in 2002) the Hartmann neuron. http://www.vintagesynth.com/misc/neuron.php Never had the "chance" to put my hands on it.


Despite the requisite HN cynicism, this is really, really cool.

This is a bland marketing demo of course they're not going to get to the really interesting parts. Oh, this combo of instruments didn't really sound that good...yeah, that's because they chose an arbitrary combination to demonstrate that they could, not because they're claiming that every new sound will be amazing. An important aspect of any new medium is the ability to make novel mistakes.

Music synthesis has been a thing for a long time now, so of course there are going to be sounds that you can get through some other technique. That doesn't mean there can be no surprises using a similar (but still novel) approach.

Andrew Huang did a video a while back about nsynth (just playing with the algorithm, before the nsynth super) and lo, despite nsynth's absolute dearth of musical merit, he actually manages to make something cool.

Edit: link to that video:

https://youtu.be/AaALLWQmCdI


This is a bland marketing demo of course they're not going to get to the really interesting parts.

Isn't the point of marketing to emphasize the interesting parts? If the demo isn't compelling, who will take the time to make something that is?


This is such a cool project, especially the principles behind it (open hardware!).

As with anything, there's prior art, so everyone should _chill_ :)

Being passionate about synthesis, I have to mention vector synthesis which has a similar vibe in terms of interaction. More like simple mixing between sounds but nonetheless really neat and powerful.

https://en.wikipedia.org/wiki/Vector_synthesis Yamaha TG-33 https://www.youtube.com/watch?v=8DK7K5sFqWg SCI Prophet VS https://www.youtube.com/watch?v=1lJL3blZKVM Korg Wavestation https://www.youtube.com/watch?v=i1fokDelaxM


Clickbait? Where is that music?

Didn't hear one single interesting sound out of that box that can be used to create music. Most producers and musicians already went back to analog because it sounds so much better.

I would be really impressed if some machine learning algorithm can only generate a kick that sounds better than the original analog TR808/909.


I studied audio engineering and music synthesis in college and accumulated racks of synths. I rarely composed a complete piece of music but rather became consumed with 'sculpting' sound. After graduating I sold all the gear and bought an acoustic guitar and microcassette recorder. Constrained to just voice and guitar I was MUCH more productive.

Perhaps I'm naive or 'old school' but I'm struggling to believe we've exhausted all possibilities of what can be done with just 12 notes (and their octaves) and rhythm subdivided into 32nds or 64ths.

In other words I'm more interested in how ML can help us discover new combinations of notes (melodically and harmonically) and rhythms rather than new kinds of sounds.


Even when it comes to electronic music, most of the stunning new sounds we come across are achieved by hacking or tweaking simple systems : over use of compressor, overuse of LFO for dubstep basses, bitcrushing, use of half defective samplers producing slightly out of tune sound, prepared pianos...


With regard to tonality: you might enjoy some microtonal works, traditional turkish music [0], electronic music [1] and many other non-european musical traditions (indian, arab).

For rhythm there's a wide range for experimentation: from really slow tempos [2] to really fast and (seemingly) chaotic [3]. I like this study from Conlon Nancarrow [4], where the tempo between two voices evolve in time in coordination with each other.

So this 'different' music exists, and has existed for a long time in human history, it's just not very popular. E.G. In [1], apart from Wendy Carlos's invented microtonal scales there are some songs that take from other cultures [5] (See how the 7th track is similar to [3] interestingly enough).

And it makes sense for not being popular actually, given how human ears perceive chords as really fast polyrhythms. The simpler the more harmonic they sound. [6]

0: https://www.youtube.com/watch?v=JxWUgqduqYM

1: https://soundcloud.com/roberto-la-forgia/sets/beauty-in-the-...

2: https://www.youtube.com/watch?v=wEiRKpflgQA

3: https://www.youtube.com/watch?v=Zg1sgrw1PGM (1:50 onwards)

4: https://www.youtube.com/watch?v=f2gVhBxwRqg

5: https://en.wikipedia.org/wiki/Beauty_in_the_Beast

6: https://www.youtube.com/watch?v=JiNKlhspdKg


I'm with you on this. I'd like to see ML applied to compositions of single artist (MIDI files) to try and emulate his style. Then you could create a blend of Santana and Gilmour. Even better, model should also work in real-time to supplement live performance with virtual band members. Is anybody working on this?


I'm working with on something just like that. IDE : Windows notepad = my thing : normal daws. Sign up here if you want to be notified when it's ready https://docs.google.com/forms/d/1-aQzVbkbGwv2BMQsvuoneOUPgyr...


What I'd like to see is a deep-learning project which uses a brain sensor like an EEG. It could learn what kind of music the listener likes (and under what circumstances). This could be the first step, and a useful tool in itself. Then in the next step it could learn to generate new music which the listener likes, using the previously generated model and/or using a feedback loop involving the user.


Do you really think a single simple metric like EEG would tell you much about musical preferences (not mentioning type of music to synthesize) given 100B neurons in average brain? For the first part you could try a RNN in an hour; not sure it would give you any meaningful results though.


Well, I'm guessing that a simple metric could tell whether the user likes or dislikes the music they are currently listening to. Advanced brain scans can show activity in the pleasure center of the brain, so perhaps an EEG can do something similar based on a learnable multivariate function.

You could view it as supervised learning, but with the user having only a passive (listening) role.


There is a tendency to judge and predict the commercial success of this (and every) music project.

As a long-time player and designer of digital music instruments in academic laboratory and underground contexts, i can say it's almost impossible to produce a hit directly from a laboratory. It has to go through a commercial layer first. Then an artistic layer.

That's not a dig at anyone or anything. Music is a projection of life's experiences into sound... A lab can inject some tech into the music world, but we won't know for approximately 1-2 decades whether any instrument or technique is a hit or it.

FM Synthesis, formalized by Chowning at Stanford ,comes to mind. His initial implementations were idiosyncratic. His math was an organization of community knowledge. It took Yamaha's need to make sound cards and keyboards to direct all this effort those sounds into the chips and instruments we ended up buying and loving in the 80s. And it took artists and producers to bring out the best of the techniques. Also, you'll find when the tech passes through the artistic layer, there are many more mundane variables than the technology, such as re-sale value and ability to get them re-paired.

For example, if you think these sounds are non-compelling, I dare you to look up compositions by Chowning himself! [1] They are nothing like Samantha Fox's "Touch Me," [2] a famous example of the Yamaha DX-7 FM synthesizer.

The point of research like this is not really to inspire music fans so much as to inspire the next generation of commercial instrument designers and eventually artists and producers.

[1] https://www.youtube.com/watch?v=988jPjs1gao

[2] https://www.youtube.com/watch?v=W1btg3mpEOc


I appreciate there's a sense of logic that manufactures the sounds that this produces.

But, in every practical sense, the sounds are seemingly random to the listener.

Abstract expressionist art can be enjoyed regardless of the fact it's visually just a bunch of random scribbles.

But the sound this machine produces has to then be sculpted - lest we listen to a droning tone.

Would be cooler if you could describe the timbre of a sound you want to hear, and it produces that...

"I want a woody.. tinny.. percussive sound" (out pops some kind of glockenspiel marimba hybrid) or "I want a breathy sounding noise that sounds synthesized with the tone of tenor vocalist but a punchy distorted entry".


Does the hardware do anything interesting, or does it only process midi input and provide an interface to mix between generated sounds? From their instructions on creating new sounds, it looks like they're pre-computing the generated sounds: https://github.com/googlecreativelab/open-nsynth-super/tree/...


The 'fnure' reminds me of an orchestral stab: https://youtu.be/w0qnBU7fWKo


Does anyone know how to compile the the open-nsynth algorithm into a VST that could be used with a DAW? I couldn't find this yet.

source: https://github.com/googlecreativelab/open-nsynth-super/tree/...


About real new sounds: I got the feeling all sounds are already generated. This AI might generate variations of sound we already know, but could there be any new sounds we never heard before?

I got the feeling that since like 1920 until now all sounds are generated and we don't experience any 'new' sounds anymore.

Maybe AI could be used to explore sounds we really never heard.


You can combine sine waves in an infinite number of ways, so I think it's safe to assume there are plenty of sounds nobody has ever heard before.

To some degree, it's about how you group things. If you believe a guitar and a banjo are similar enough to qualify as the "same" sound – then no, there's probably not too much left to discover. But if small subtleties in timbre are fair game, then there's basically a limitless supply of new sounds.


You know what would be a really useful audio tool that may be amenable to a machine learning approach? An AI-mastering algorithm. Training set: pre- and post-mastered audio files. IMO the biggest thing holding back indie producers is that final step that turns your home recording into radio-ready music with requisite "loudness".


Google's self-assembly guide for the “Open Nsynth Super” is here:

https://github.com/googlecreativelab/open-nsynth-super


I was really hoping the buzzword kamikaze would leave synthesis and music alone for at least a few more years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: