Hacker News new | past | comments | ask | show | jobs | submit login
Rhasspy – Offline private voice assistant for many human languages (github.com/rhasspy)
195 points by rft on Nov 22, 2022 | hide | past | favorite | 22 comments



Discussed 3 years ago [1]. See also the docs [2] and board [3].

Wanted to post this as currently Alexa and Mycroft are being discussed, but I have not seen a mention of this project. It is offline first, compared to Mycroft which relies on cloud services by default, even if you seem to be able to run it completely offline.

To share my personal experience with it, I am impressed with how well it can handle the basic commands I need (lights on/off, music control). It also features a reliable and easy satellite integration, so I can do the heavy lifting (STT and TTS) on my home server and only keep wakeword and sound input/output on the RPi. Different than Alexa, you define each command ("sentence") yourself, which seems to really help with the recognition accuracy. It also means you have no problem of discovering all available commands.

I have it integrated with Home Assistant for some basic automation. Essentially each sentence sends off a command to Home Assistant with the recognized text and an ID. You can also send back text that is then spoken by Rhasspy. For reference, I am using Porcupine as wake word engine ("Computer"), Kaldi as STT, Fsticuffs as intent recognition, and Larynx as TTS (voice "harvard"). The home server is a desktop PC, the satellite currently runs on an RPi 4 with about 2-10% load.

[1] https://news.ycombinator.com/item?id=21926027 [2] https://rhasspy.readthedocs.io/en/latest/ [3] https://community.rhasspy.org/


I am very pleased to see more FOSS voice assistant solutions out there.

Related and also cool: While exploring PeerTube last week, I stumbled across someone using a new voice assistant, Numen, on their PinePhone running SXMO. The code repository link is in the video description: https://peertube.plasmatrap.com/w/dYo8QcVVFFpzS7PwddFPMV


Do you have a write up anywhere? This is exactly the setup I'm looking for. I have HAOS on a server, and I have a bunch of Raspberry Pis I'd like to run as satellites.

Ideally I can find a HAT with mics and speakers with sufficient quality/volume that I can build self-contained I/O devices for each room; the key really being that I want no more than a single power plug, and no more bulk than ~ a Pi + Housing.


Sadly there is no write-up. I essentially followed the tutorial [1] on an RPi first, then split it into main and satellite after I got it to work reliably on the single node. The HASS integration (Intent Handling) is fully supported in Rhasspy, just enter the URL and the access token. I am using intents (not events, I can't tell you right now what the difference is).

I do not use the Rhasspy integration in HASS, I run the main node in a docker container. Rhasspy (main and satellite) is connected to the Mosquitto broker exposed by HAOS. The satellite pushes the STT, Intent Recognition and TTS via the Hermes MQTT handler to the main node over this connection. The acronym overload should become clearer when you start tweaking things in the Rhasspy web interface, it is very well done for the amount of customization you get.

The HASS side is done via yaml scripting. You can find my mpd.yaml at [2]. I select music to play via an ampd [3] interface hosted at the RPi connected to my HiFi (not the satellite, dedicated RPi). The sentences I use are at [4]. The names for HassTurnOff are the same as in HASS, this gets handled by built-in intents in HASS. I am considering trying out Node-RED, because I am a bit unhappy with the yaml.

I am also looking for a case. Currently I use [5] just hooked up to the ReSpeaker HAT 3.5mm and USB on the RPi with cables in a chaos on my desk. I also took apart [6] (smaller than you would expect, does not go back together nicely) and it might fit an RPi0W2 with HAT, if you take of its top. Both speakers are USB powered, so you can plug it directly into the RPi. I would prefer to stuff everything in the case of [5] and hook up the blue LED to a GPIO, but it is quite cramped in there, maybe it can "fit" by making a hole in the back.

[1] https://rhasspy.readthedocs.io/en/latest/tutorials/ [2] https://pastebin.com/M7KmQSiM [3] https://github.com/rain0r/ampd [4] https://pastebin.com/tsUqWwpy [5] https://www.amazon.de/gp/product/B07DDK3W5D [6] https://www.amazon.de/gp/product/B08GC8K8ZR


Amazing, thanks for writing that up with links. I need to save this comment.

Most likely I'll do this the exact way you have, not using the HASS integration (I've found it limiting in several ways; particularly when using Frigate, as the HASS Appliance doesn't provide a way to mount storage).

I use Node-RED for a couple of things now and it's a really nice interface for creating complex interactions that would just be a bit of a nightmare to write up in YAML.

I've tried a couple of things so far, like the M5 Stack Atom speaker with ESPHome Media Player which meets the compactness requirement but audio quality is awful and the volume is so quiet it's practically rendered useless just by fan noise from computers in my home office.

Using a decent HAT, I might build something little out of wood that won't look out of place in the living room; my wife is ~~unlikely~~ not going to approve anything like (6). The ReSpeaker 2 with it's JST for a little speaker might be ideal!


YO. This is super-nice. Is anyone aware of people using this with bash shell scripts, which are my bread and butter?

Offhand, I can't think of why this should be too hard, if it generates JSON or whatnot I can set up a watch folder or something?


This was the reason I designed voice2json [1] :)

[1] https://voice2json.org/


Yup, I remember seeing this as well. I'm trying to determine the major differences between Rhasspy and this?


Rhasspy is a more powerful general-purpose application and GUI; voice2json is more like a library or micro-service that does exactly one thing: convert a speech waveform to JSON. They share some DNA though (same syntax for defining vocabulary).

I used voice2json to build a voice-controlled car audio player, it works amazingly well: https://github.com/lukifer/voicetunes


They share a lot of the same pieces, but voice2json is meant to work in Unix-style pipelines. Rhasspy has MQTT/HTTP/Websocket APIs instead.


You can set Intent Handling to a local command (instead of Home Assistant in my example) [1]. I never tried that, but according to the docs you get JSON on stdin and output JSON on stdout. From that you should be able to hack in any pipeline you need.

[1] https://rhasspy.readthedocs.io/en/latest/intent-handling/#co...


I've written some Rhasspy bash scripts for managing timers, doing units conversions (for cooking), and interfacing with mpd for music control. see https://github.com/thinkingcow/homeassistant


Rhasspy author here, thanks for posting! Just wanted to mention that I've joined Nabu Casa (creators of Home Assistant) this month, so Rhasspy will be receiving updates again and be a major part of Home Assistant's "Year of Voice" in 2023 :)


Thank you for your work! I was in a panic when Snips was bought up. After some research I landed on Rhasspy as my new local-first digital assistant, and it's been fantastic. Been using it for a few years now with satellites around the house with the 'brain' running on a VM. Even have a Siri shortcut which transcripts my speech input then makes an HTTP request to 'brain' instance so that I can use Rhasspy even if not around a satellite instance. This even works over my VPN!


You're welcome! What sort of hardware did you settle on for the satellites?


Best 20 minute video of setup and demonstration that I saw was by Everything Smart Home: https://youtu.be/BLJR_v3IFwk


This looks really cool and I definitely want to play with it, but is the logo supposed to look so sinister? Is this a Rorschach test that I'm failing? Or a shape that everyone but me recognizes?


I have looked at this off and on, but the music integration seemed rather shallow (i.e., I couldn’t find information on how to navigate and play albums or playlists on Plex, for instance). Anyone got pointers for that?


This has always been a struggle. Rhasspy can gather lists of songs, artists, etc. but it will have to guess many of their pronunciations. And it seems artist/band names often purposely thwart conventional pronunciation rules :P


Would be great if this would run on one of the commercial offerings such as Alexa hardware.


Can this run on my Fedora desktop?


There is no obvious reason why it can't work. The docker image [1] should be all you need, it runs both on my satellite and my main server. Also keep in mind you will need a way of actually doing something useful with the assistant (see [2] for some info, you might want Home Assistant if it fits your use case).

I personally like to run the satellite on an RPi, because I can easily use a microphone HAT and a cheap pair of speakers with it. I can also place it anywhere I want, not only at my desk. It also means I have no other sound sources and sinks on the same hardware, which makes the setup a bit easier. I do the main processing in a docker container on a desktop PC serving as my home server (Ryzen 5 3600X, but it handles far more than Rhasspy).

[1] https://rhasspy.readthedocs.io/en/latest/installation/ [2] https://rhasspy.readthedocs.io/en/latest/intent-handling/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: