> This would’ve worked equally well as a plain HTTP-based protocol
With plain HTTP you can quite easily "stream" both the request's and the response's body: that's a HTTP/1 feature called "chunking" (the message body is not just one byte array, it's "chunked" so that each chunk can be received in sequence). I really don't get why people think you need WS (or ffs SSE) for "streaming". I've implemented a chat using just good old HTTP/1.1 with chunking. It's actually a perfect use case, so it suits LLMs quite well.
With plain HTTP you can quite easily "stream" both the request's and the response's body: that's a HTTP/1 feature called "chunking" (the message body is not just one byte array, it's "chunked" so that each chunk can be received in sequence). I really don't get why people think you need WS (or ffs SSE) for "streaming". I've implemented a chat using just good old HTTP/1.1 with chunking. It's actually a perfect use case, so it suits LLMs quite well.