We don't transcode audio/video on the server. Each stream is processed and muxed by the client. The server is merely relaying rtp packets and tagging them with a given ssrc per peer. The client does the rest of the work.
The bulk of the user-space time on the SFU is spent doing encryption (xalsa/dtls). We also avoid memory allocations in the hot paths, using fixed-size ring buffers as much as possible.
If Discord is basically proxying the raw packets from one client to the others, isn't that wasted bandwidth (for discord, not the clients). I understand from the post that the goal would be to mask the ip of the users, to shoulder user privacy and the ddos vector. Kudos on silence detection to save overhead.
So video w/audio broadcasting has to be compressed client side, then proxied through Discord's media servers, to the end user's. That's pretty smart...I just wished that I could send my raw stream to a LAN host so I could offload the compression, and allow my LAN host to provide delivery (I'm a nitro user).
Would rather waste bandwidth than CPU cycles in this case. Would take way too much CPU time to mux audio streams together server-side, and then recompress. (Means we have to buffer data for each sender, deal with silence, deal with retransmits and packet drops, have a jitter buffer, etc...). No way we'd be able to hit the # of clients we want per core with that overhead. Our SFU's are intentionally very dumb for this reason.
Also, muxing server side means we can't do things like per-peer volume and muting, without having to individually mux and re-encode for each user in the channel depending on who they have muted and the volumes they have set per peer (which would explode CPU complexity even further).
So, in this case, bandwidth is cheap, let's use (and waste) some, in an effort to simplify the SFU, and also, make it more CPU efficient. Default audio stream is 64kbps (or 8 KB/sec), per speaking user.
The bulk of the user-space time on the SFU is spent doing encryption (xalsa/dtls). We also avoid memory allocations in the hot paths, using fixed-size ring buffers as much as possible.
Additionally, we coalesce sends using sendmmsg, to reduce syscalls in the write path: (http://man7.org/linux/man-pages/man2/sendmmsg.2.html)
I posted some about the specs here: https://news.ycombinator.com/item?id=17954163