If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models


I agree that it’s not production ready and they know that too, hence the name. But in relation to your points, I plugged in some speaker as it’s not really that great of a speaker at all.
For the wake word, at some point they did an update to add a sensitivity setting so you can make it more sensitive. You could also ty donating your voice to the training: https://ohf-voice.github.io/wake-word-collective/
But all in all you’re spot on with the challenges. I’d add a couple more.
With OpenAI I find it can outperform other voice assistants in certain areas. Without it, you come up across weird issues, like my wife always says “set timer 2 minutes” and it runs off to OpenAI to work out what that means. If you says “set a timer for 2 minutes” it understands immediately.
What I wish for is the ability to rewrite requests. Local voice recognition can’t understand my accent so I use the proxied Azure speech to text via Home Assistant Clound, and it regularly thinks I’m saying “Cortana” (I’m NEVER saying Cortana!)
Oh and I wish it could do streaming voice recognition instead of waiting for you to finish talking then waiting for a pause before trying anything. My in-laws have a google home and if you say something like “set a timer for 2 minutes” it immediately responds because it was converting to text as it went, and knew that nothing more was coming after a command like that. HAVP has perhaps a 1 second delay between finishing speaking and replying, assuming it doesn’t need another 5 seconds to go to open AI. And you have to be quiet in that 1 second otherwise it thinks you’re still talking (a problem in a busy room).