Like the infinite monkeys typing Shakespeare, but with audio instead.
If there was a program that created a series of sounds at random intervals, pitches, amplitudes, etc., how long would it take to produce an output that sounds like music, some sort of recognisable recording (e.g. a bell ring, a dog barking), or perhaps even a human voice?


I think it is safe to say that OP’s question was lay speak for “what is the mean time to get to a result”. Other than that I don’t think you actually addressed the question.
Let me try to get it started:
Randomly generating music might be akin to password cracking. Cracking short or simple passwords can be very fast, while cracking long or complex passwords can be very long. The rate of password guessing also affects the time to get a result.
To calculate an answer, we need the following information:
You might be able to take a genre of music, and decompose the songs within to get some answers… I don’t have the time for that. Anyone want to take a stab at estimating the calculation?
OPs question is just any audio that strikes the listener as being a “real” sound. Doesn’t have to be long. Doesn’t have to be a song.
Because it just has to be “a” “real sound” i think there is an inherent measure of subjectivity. I might think a sound sounds like something you might not.
I think I’d approach this differently. I’d just pick a short time frame (maybe 0.5s) and generate 64kbs (PCM bitrate) worth of noise.
What percentage of those should have waveforms with any shape whatsoever within the domain of human perception. (What percent of random noise has the possibility of representation of a limited physical system interacting with the atmosphere in a way the human ear could perceive it)
Then, of that, subjectivity what percentage of those sounds “sound like a thing”.