Show HN: Sonauto API – Generative music for developers
sonauto.aiHello again HN,
Since our launch ten months ago, my cofounder and I have continued to improve our music model significantly. You can listen to some cool Staff Picks songs from the latest version here https://sonauto.ai/ , listen to an acapella song I made for my housemate here https://sonauto.ai/song/8a20210c-563e-491b-bb11-f8c6db92ee9b , or try the free and unlimited generations yourself.
However, given there are only two of us right now competing in the "best model and average user UI" race we haven't had the time to build some of the really neat ideas our users and pro musicians have been dreaming up (e..g, DAW plugins, live performance transition generators, etc). The hacker musician community has a rich history of taking new tech and doing really cool and unexpected stuff with it, too.
As such, we're opening up an API that gives full access to the features of our underlying diffusion model (e.g., generation, inpainting, extensions, transition generation, inverse sampling). Here are some things our early test users are already doing with it:
- A cool singing-to-video model by our friends at Lemon Slice: https://x.com/LemonSliceAI/status/1894084856889430147 (try it yourself here https://lemonslice.com/studio)
- Open source wrapper written by one of our musician users: https://github.com/OlaFosheimGrostad/networkmusic
- You can also play with all the API features via our consumer UI here: https://sonauto.ai/create
We also have some examples written in Python here: https://github.com/Sonauto/sonauto-api-examples
- Generate a rock song: https://github.com/Sonauto/sonauto-api-examples/blob/main/ro...
- Download two songs from YouTube (e.g., Smash Mouth to Rick Astley) and generate a transition between them: https://github.com/Sonauto/sonauto-api-examples/blob/main/tr...
- Generate a singing telegram video (powered by ours and also Lemon Slice's API): https://github.com/Sonauto/sonauto-api-examples/blob/main/si...
You can check out the full docs/get your key here: https://sonauto.ai/developers
We'd love to hear what you think, and are open to answering any tech questions about our model too! It's still a latent diffusion model, but much larger and with a much better GAN decoder.
I'm not going to comment on the technical side of things, which is way beyond my technical comprehensions skills, and I'm sure it required a considerable amount of brain, time and energy to reach similar results.
But music production and distribution is (actually, was) my home turf, so here's my two cents on the topic:
I've already heard music qualitatively on par with the tracks available on your demo page. I've heard it way more than I truly wanted or felt it was necessary, at least once a day while tracking on pro tools hundreds of albums you've never ever heard of, in studios in France and LA, for years.
It was made with people with the best intentions, coming from all sorts of walks of life, and yet it was obvious from the first note they played that they were condemned to the oblivion, their music destined to be basically never heard by anyone.
And this has been done every day, multiple times a day, in every studio around the world, since the '60s.
20% of Spotify music has never been played once. IIRC less than 40% has been played more than once.
There's a genuinely humbling scene in the 2002 documentary "Scratch" where DJ Shadow, a world-renowed DJ and producer, wades trough stacks of EPs out of a record store in NY that have never, ever been played once[1], which perfectly captures how little of the musical output being recorded we actually get to listen to.
Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so. For every whitebread pop track you've heard on the radio, there's thousands of other similar tracks that have been discarded by an A&R, a radio DJ, some label, or simply by the audience.
I'm saying this with no ill feelings towards you or your work, but I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
[1]https://www.youtube.com/watch?v=1gpKYnRdf0A&t=6s
> I'm saying this with no ill feelings towards you or your work.
I can. It’s predatory behavior, performed by people looking to steal and cash in on something they have neither the skill, understanding, or love to make on their own.
> I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
Easy: Independent/single-dev operations needing some quick background music for a project (game, whatever)
This is already easily solved by using royalty-free music or by licensing pre-made music from numerous publicly available sound libraries online -- with the added benefit of supporting actual musicians instead of plagiarist tech middlemen.
On one hand this is impressive, and I've been wondering when something like this would appear. On the other hand, I am -- like others here have expressed -- saddened by the impact this has on real musicians. Music is human, music theory is deeply mathematical and fascinating -- "solving" it with a big hammer like generative AI is rather unsatisfying.
The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.
In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)
Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.
So if I make a song using this API, who owns the copyright? Is it me or Sonauto?
I'm not sure to what extent AI music is copyrightable (I think it depends on a case-by-case amount of human influence) but our TOS assigns any rights we may have to the user.
From their terms (https://sonauto.ai/tos):
8. OUTPUT As between You and the Services, and to the extent permitted by applicable law, You own any right, title, or interest that may exist in the musical and/or audio content that You generate using the Services ("Outputs"). We hereby assign to You all our right, title, and interest, if any, in and to Your Outputs. This assignment does not extend to other users' Outputs, regardless of similarity between Your Outputs and their Outputs. You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide license to use Your Output to provide, maintain, develop, and improve the Services, to comply with applicable law, and/or to enforce our terms and policies. You are solely responsible for Outputs and Your use of Outputs, including ensuring that Outputs and Your use thereof do not violate any applicable law or these terms of service. We make no warranties or representations regarding the Outputs, including as to their copyrightability or legality. By using the Services, You warrant that You will use Outputs only for legal purposes.
You own the rights, but Sonauto is granted the rights to use it as well.
One thing I've been thinking about is how to do a better hobbyist plan system. It would be cool to do a flat rate unlimited plan, but we wouldn't want that to then be abused by larger customers/companies. Are there existing API providers you think solve this particularly well?
I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.
I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?
runway.ai (video gen) is what I was thinking when I suggested this.
Why would a hobbyist need an unlimited plan?
E.g., in the case of a future "LibreMusic" open source UI or an integration into their DAW they work with on the weekends. I'd get pretty annoyed if I had to keep putting a coin in the machine to adjust Logic Pro effects.
The transition btw two songs demo is super cool! I often need to do this when editing videos but used to have no way to do it.
Not to mention that now you can have playlists that transition seamlessly btw two songs. Low-cost party DJ?
Okay. I know these guys IRL. BUT, I genuinely think they have the best music model out there. Hands down. The songs are just more unique, and have a wider range of musical variation. With Suno/Udio, the songs just sounds the same after a while (just with different lyrics).
That could just be me though. I am curious what users of Udio/Suno think?
Quality has improved so much too, I tried it a few months ago at Demo Day and I’m blown away by how good it is now.
Congrats on the API launch (from SkyPilot)!
Thanks! We used SkyPilot (an open source cloud GPU worker management tool) to help out with both our small (single node) and large (many node) training runs.
I'm familiar with video and image diffusion model architectures, but know almost nothing about music models.
Are there any good papers or writeups on them?
Are there any open source implementations to play with?
There are!
Audio models are actually quite similar to image models, but there are a few key differences. First, is the autoencoder needs to be designed much more carefully as human hearing is insanely good and music requires orders of magnitude more spatial compression (image AEs do 8X8 downsampling, audio AEs need to do thousands of times downsampling). Second the model itself needs to be really good at placing lyrics/beats (similar to placing text in image diffusion): a sixth finger in an image model is fine, but a missed beat can ruin a song. That's why language model approaches (which have a stronger sequential inductive bias than diffusion models which is good for rhythm and lyric placement) have been really popular in audio.
If you're interested in papers (IMO not good for new people as they make everything seem more complicated than it is):
Stable Audio (similar to our architecture): https://arxiv.org/abs/2402.04825 (code: https://github.com/Stability-AI/stable-audio-tools)
MusicGen (Suno-style architecture): https://arxiv.org/abs/2306.05284 (code: https://github.com/facebookresearch/audiocraft/tree/main)
Not related to this post, but I was wondering about AI music generators and I don't have experience with their capabilities. The ones I know seem catered to making entire songs.
I was having a discussion with a friend who writes a lot of guitar music but can also play bass and sing. However, getting good drums is a problem. What he'd like is a service to upload his songs in some form (just guitar, or a mixed version with bass and vocals) and get an output that layers a drum track without altering the input. Ideally with appropriate fills, etc. I mean, just getting an in-time drum stem would probably be even better.
Is there any GenAI service to do this kind of incremental additive drums?
There's work in that area, it's sometimes called "accompaniment generation."
https://arxiv.org/abs/2301.12662
https://fastsag.github.io/
Not sure about GenAI, but Logic Pro has the ability to add a Session Drummer which can be set to track a given bass stem and produce passable drums for a song.
how did you create this without committing grand theft musica
The first 80s song I heard was a literal copy of Phil Collins. But there are no emotions attached to it (for me), and the lyrics are random. It’s more like supermarket background music IMHO, not something I would pay for, especially when we have centuries of music to discover already, why make fake stuff like that?
Edit: I have just heard the funniest most ridiculous metal song ever without a touch of metal inside. Breathe of Death, it’s like a bad joke.
If thats the future of anything, I’m going back to plain C (code) when I retire and I’ll never approach the internet ever again.
In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway). I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing. If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it, losing massive value to humanity.
It’s intellectual property laundering. A company selling a button that launders the blood sweat and tears of generations of artists is not the same as a person being inspired and dedicating themselves to mastery.
Humans create value. AI consumes and commoditizes that value, stealing it from the people and selling it back to their customers.
It’s unethical and will be detrimental in the long run. All profit should be distributed to all artists in the training set.
I’m skeptical about how much value AI art is going to really contribute to humanity but as a lifelong opponent of copyright I have to roll my eyes when I see people arguing against it on behalf of real artists, all of whom are thieves in the best case and imitators in the worst.
Yeah every musician has a story of writing a new song, bringing it to the band, and they say "oh, this sounds just like [song]." It's almost impossible to make something truly novel.
> almost impossible to make something truly novel
But beyond the originality !== novelty discussion, I'm not sure how we've come to equate 'creativity' (and the rights to retaining it) to a sort of fingerprint encoding one's work. As if a band, artist or creator should stick to a certain brand once invented, and we can sufficiently capture that brand in dense legalese or increasingly, stylistic prompts.
How many of today's artists just 'riffing' off existing motifs will remain, if the end result of their creative endeavours will be absorbed into generative tools in some manner? What's the incentive for indies to distribute digitally, beyond the guarantee their works will provide the (auditory) fingerprints for the next content generation system?
I have written and performed many songs over many bands. At no point did anybody compare my work to any other artist's work, because it is genuinely unique.
The difference being that a musician being influenced by other musicians still has to work to develop the skills necessary to distill those influences into a final product, and colors that output with their own subjective experiences and taste. This feels like a conveniently naive interpretation to justify stealing artists' work and using it to create derivative generative slop. The final line in your comment is pretty telling of how seriously you take this issue (which is near-universally decried by artists) -- some other massive company is doing a bad thing, so why shouldn't I?
edit: I have to add how disingenuous I find calling out corporations owning "all of humanity's musical knowledge and history" as if generative AI music trained on unlicensed work from artists is somehow a moral good. At least the contracts artists make with these corporations are consensual and have the potential to yield the artist some benefit which is more than you can say for these gen-AI music apps.
I don't see how the amount of work that went into it changes the core fact that all art is influenced by that which came before, and we don't call that stealing (unless you truly believe that "all art is theft").
My point re: LLMs wasn't meant to exclusively be a "they're doing it" one, the hope was to give an example of something many people would agree is super useful and valuable (I work much faster and learned so much more in college thanks to LLMs) that would be impossible in the proposed strict interpretation of copyright.
edit responding to your edit:
Re: moral good: I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good. Music production software costs >$200 and studios cost thousands and majoring in music costs hundreds of thousands, but we can make getting started so much easier.
Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market? To be clear, artists absolutely have a right to benefit from reproduction of their recordings. I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with (if their right to this knowledge were affirmed, every new song someone creates could hypothetically have a konga line of lawyer teams clamoring for "their cut" of that chord progression/instrument sample/effect/lyrical theme/style).
I think there are a few fallacies at play here:
1. Anthropomorphizing the kind of “influence” and “learning” these tools are doing, which is quite unrelated to the human process
2. Underrepresenting the massive differences in scale when comparing the human process of learning vs. the massive data centers training the AI models
3. Ignoring that this isn’t just about influence, it’s about the fact that the models would not exist at all, if not for the work of the artists it was trained on
I think we intuitively allow for artists to derive and interpolate from their influences because of a baseline understanding that A) it is impossible to create art without influence and B) that there is an inherent value in a human creating art and expressing themselves. How that relates to someone using unlicensed music from actual humans to train an AI model in order to profit off of the collective work of thousands of actual human artists, I have no idea.
edit:
> I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good
Generative AI music isn't in any way accomplishing this goal. A free Spotify account with ads accomplishes this goal -- being able to generate a passable tune using a mish-mash of existing human works isn't bringing musical knowledge to the masses, it's just enabling end users to entertain themselves and you to profit from that.
> Is it really consent for those artists signing to labels
Yes? Ignoring the fact that there are independent labels outside the ownership of the Big Three you mention, artists enter into contracts with labels consensually because of the benefits the label can offer them. You train your model on these artists' output without their consent, credit or notification, profit off of it and offer nothing in return to the artists.
A) Agreed! B) So I guess the argument here is that this doesn't apply to AI music. I think that if someone really pours their soul into the lyrics of a song and regenerates/experiments with prompts until it's just right, and maybe even contributes a melody or starting point that's still a human creating art and expressing themselves. It's definitely not as difficult as creating a song from scratch, but I've been told similar arguments were made regarding whether photography was art when that became a thing.
btw, if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
> if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
Am I understanding right that the point here is that while you are able to get away with using copyrighted material to turn a profit, your end users cannot, so no worries?
> Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market?
This premise is false. I have made plenty of money busking on the street, for example. Or selling audio recordings at shows.
> {o be clear, artists absolutely have a right to benefit from reproduction of their recordings.
This is correct. Artists benefit when you pay them for the right to reproduce. When you don't (like what you are doing), you get sued. Here's a YouTube video covering 9 examples:
https://www.youtube.com/watch?v=IIVSt8Y1zeQ
> I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with
What?
> I have made plenty of money busking on the street
That's why I specified mass market. However, given a choice between literally being on the street and working with a record label I'd probably choose the label, though I don't know about others.
> pay them for the right to reproduce
My point is learning patterns/styles does not equate to reproducing their recordings. If someone wants to listen to "Hey Jude" they cannot do so with our model, they must go to Spotify. There are cases where models from our competitors were trained for too long on too small a dataset and were able to recite songs, but that's a bug they admit is wrong and are fighting against, not a feature.
> in most cases it wasn't theirs to begin with
In most cases they did not invent the chord progression they're using or instruments they're playing or style they're using or even the lyrical themes they're singing. All are based on what came before and the musicians that come after them are able to use any new knowledge they contribute freely. It's all a fork of a fork of a fork of a fork, and if everyone along the line decided they were entitled to a cut we'd have disaster.
Megacorporations owning copyrights to the majority of IPs(music, games, etc.) is a capitalism/monopoly problem. How does getting rid of copyright and allowing your company to profit off other peoples work in any way solve that issue?
no one can actually explain the value OpenAI adds to humanity. What massive loss? What have we gained from this entity other than another billionaire riding a hype cycle?
These high-quality music models require pirating many, many terabytes of music. Torrents are the main way to do it, but they likely scraped sites like Bandcamp, Soundcloud and YouTube.
AI music is a weird business model. They hope that there's enough money peddling music slop after paying off the labels (and maybe eventually the independent music platforms) whose music you stole. Meanwhile, not even Spotify can figure out how to be reliably profitable, serving music people want to hear.
how is this better or different from suno besides api? I'm assuming since you are smaller the quality is not as good and the depth not as wide.
Suno's RVQ-token-based language model is tuned give you an acceptable song that most of their userbase would prefer every single time, but isn't very diverse. Our diffusion model is much more diverse and has higher vocal audio quality, but the results aren't always consistent (just like Flux et al). However, since we have unlimited generations this can be worked around. We're also never going to preference tune our model because I think the stuff that is lost in that process is valuable.
I use both. Sonauto sounds more "real" and varied than what I can get with suno
What is the point of generating this low quality AI slop music, what real use case do you have in mind?
For the consumer stuff: It's fun, and IMO that's enough. Not every song has to be peak artistic quality pushing the world forward, sometimes it's enough to bring a smile to a friend's face by making a song about them. If you think their art is slop you shouldn't have to listen to it (IMO Spotify et al should have an optional "no AI music" filter for now).
For the API: I think this could be integrated into artists workflows in lots of ways we can't even imagine right now as it gets better. One example I gave above was generating transitions between songs.
the reason a song from a friend makes you happy is directly related to the effort behind it, this is totally meaningless.
The reason anything makes anyone happy is completely subjective, as evidenced by the many people who have told us our app made them and/or their friends and family happy.
I made little gift songs for friends for awhile. It was nice and fun. Making a roadtrip theme song for friends on a vacation is way fun, and kinda locks in the moment
I also used it when I was living in New Orleans to help a friend come up with a riff for a live set he had, which had some unusual constraints (only had a singer, drummer and trombone, but no others, in an echoey space). He used the generated song hook as inspiration for that nights' arrangement
There's lots of stuff, and song of it supports artists who have tight timelines and want creative support
There's so much real independent music out there that actually has meaning. I hope you didn't tell your friend you wrote the song, because if someone tricked me into listening to generated not-art and I found out afterwards, I would consider them a liar.
What your friend did, using generation for inspiration for real music he creates is fine. But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
Ten years ago a friend of mine did that, hired a real person, and it cost less than $20 to write a ditty. That's comparable to the cost in tokens for an AI except you could support a real human artist instead of megalomaniac Yarvinists Sam Altman and friends.
And the song would have real meaning. You gave your friend a non-gift. The Let Me Google That For You of gifts. Honestly if one of my friends did that I'd wonder if they even like me.
The problem with AI music, and in fact AI in general, is that weve spent the last few decades aggressively attacking the idea that art should get paid for at all and yet people still do it, because they love it. So musicians work for pennies, and yet people still need to replace them with a machine.
So even if you just pay someone else to make you a song, its not really any more expensive than this. Same with painting. What does this AI bring to the table, at all? It grosses me out.
People on this site should go pick up a guitar and write a 3 chord song about someone, itll take you a day if that. Its not hard! Its fun!
The problem with real music, is that it requires a hefty amount of musicians to establish a genre. This amount could be somewhere in the range of 100 to 1000 musicians.
When this critical number is not amassed then the genre effectively dies.
With A.I. we can resurrect dead genres, but not only that, we can combine genres together, popular genres with one another, also popular and unpopular genres or popular and dead genres.
Using A.I. for music is easier and much faster than traditional means, and this could greatly reduce the critical mass of musicians to support a genre. It could be reduced as much as 10 times, or 100 times, like one person creating 10000 songs or something similar.
By trying to compare A.I. music to traditional music, you are comparing 10 songs a real band makes, with 10000 songs an A.I. (human) musician makes. It's apples and oranges comparison.
I don't see why human music cannot be a genre, the best of all genres but just one, and an innumerable amount of A.I. genres which may not be so good, but they are infinite.
The real human music genre might be the best forever or just for the next 3 years, but so what? Let there be more genres some good some bad. No one is gonna listen to a cheap copy of an already existing song of an already existing genre, but songs already in existence should be used to train A.I. weights.
Regarding A.I. weights, smaller models forget much of the information they are trained on, and they are cheaper, faster and easier to be fine-tuned, also probably easier to apply RL reasoning on. In that way, A.I. musicians (or real musicians) could run the model in their computers and use it as an instrument instead of relying in companies with big models, slow and expensive.
And some times big and inefficient models copy text/code/music verbatim from the training data. But this is a bug, when small models become competitive enough, most people are gonna use those. They might even carry them around, like a personal band always ready to make melodies for them.
Some kind of Dadaist movement I guess. Listen to Breathe of Death, it’s hilarious and then you cry.
Signed up with gmail, and get 'Generation Failed' with every attempt. Please dont email me or add me to your marketing list.
There was a single unhealthy worker that didn't get caught, we just killed it.
[flagged]
Over the years I've seen people get a lot of hate for things they've poured their souls into who turn around and post snarky/insulting responses that ended up getting them into even more hot water. I always wondered why they didn't respond with their clearly well thought out reasoning behind what they built instead of the snark, even if the original comment wasn't in good faith either. I understand them a little better now.
It's def valid to ask about the value of projects like this, but I think "Please delete this project, as you are actively making the world worse." isn't the right way to start that discussion if that was your intent. I also detailed my thoughts about the whole industry a little further down so I'll avoid duplicating that.
i agree more times over than I can count. Its pointless, borderline offensive, will not enrich anyone, and makes us all worse off.
888888 والله