Like the infinite monkeys typing Shakespeare, but with audio instead.
If there was a program that created a series of sounds at random intervals, pitches, amplitudes, etc., how long would it take to produce an output that sounds like music, some sort of recognisable recording (e.g. a bell ring, a dog barking), or perhaps even a human voice?


Quite similar to the Library of Babble, which contains every single possible combination of letters. I don’t have an answer to your question exactly but you can try exploring the library or babble to get an idea
https://libraryofbabel.info/