In a non chat setting where the LLM is performing some reasoning or data extraction it allows you to get JSON directly from the model and stream it to the UI (updating the associated UI fields as new keys come in) while caching the response server side in the exact same JSON format. It’s really simplified our stream + cache setup!
why though?