I'm likely to lose the use of my hands in the next few years so I've been trying to figure this out from the user perspective (for Linux) for a few years to try to sort of set up and get used to the tools I'll need later in life.
I've been using Almond, but it's really not good. I don't know how I might help but I'm definitely interested in the results... if I could use a high quality microphone to open a program, select menus, and type accurately (and have commands to press arrow keys) I think I'd be all set. I would be able to do anything I wanted, even if it was a bunch of steps.
I remember Dragon Naturallyspeaking in like 1995 being basically capable of doing all of this, and I was able to completely control a computer in like 1995 with speech and now I can't. It's extremely strange for 26 years of development.
It is as if all the tools try to be so clever that instead of assuming the user can learn new tricks, to me it should be the same as learning to type or use a mouse. Yeah, I used to have to say "backspace backspace period space capital while" to get fine details, but at least it was possible. I could even select things with voice commands. I just hope that we don't lose sight of the value of voice recognition as a general input device in search of which model performs best on accuracy alone.
I am sorry to hear this. I think there are many people in a similar boat to you and there are quite a few people working on command & dictation computing. Although my tool _may_ help you find out which speech systems work well for your voice/accent/mic/vocab it might also be worth trying another one of the specialist libraries specifically for dictation and controlling computers.
I've not heard of Almond, but I have seen the following projects which might be helpful:
Far field audio is usually harder for any speech system to get correct, so having a good quality mic and using it nearby will _usually_ help with the transcription quality. As a long time Linux user, I would love to see it get some more powerful voice tools - really hope that this opens up over the next few years. Feel free to drop me an email (on my profile) happy to help with setup on any of the above.
I think the current issue is that lots of people are intellectually excited by the framework stuff; libraries, that python project to implement commands, etc. I do totally get that, I definitely find it more interesting.
What would help much more as an end-user would be integrating things nicely into window managers. I am optimistic that it is on a roadmap, but I don't really get how all the pieces fit together. I hope in Linux it doesn't end up somehow requiring every application to implement support individually, it seems like a clever HID driver could do it.
Unfortunately Dragon development has mostly stalled for the last 5 years (Dragon 15 was a leap forward but that was quite some time ago now).
You can still make use of it via Dragonfly (see also Caster[0]) as mentioned by a sibling comment or by using Talon[1] or Vocola.
Having used a computer 90% hands free for about a year and a half back in 2019, I chose Dragonfly then, but would probably choose Talon nowadays - less futsing about and it has alternative speech engine options.
I also recommend looking into eye tracking: the Tobii gaming products[2] work well for general computer mousing with some software like Talon or Precision Gaze[3] - well enough for me to make a hands free mod[4] for Factorio, for example.
I've been using Almond, but it's really not good. I don't know how I might help but I'm definitely interested in the results... if I could use a high quality microphone to open a program, select menus, and type accurately (and have commands to press arrow keys) I think I'd be all set. I would be able to do anything I wanted, even if it was a bunch of steps.
I remember Dragon Naturallyspeaking in like 1995 being basically capable of doing all of this, and I was able to completely control a computer in like 1995 with speech and now I can't. It's extremely strange for 26 years of development.
It is as if all the tools try to be so clever that instead of assuming the user can learn new tricks, to me it should be the same as learning to type or use a mouse. Yeah, I used to have to say "backspace backspace period space capital while" to get fine details, but at least it was possible. I could even select things with voice commands. I just hope that we don't lose sight of the value of voice recognition as a general input device in search of which model performs best on accuracy alone.