These look quite incredible. I work on a llama.cpp GUI wrapper and its quite surprising to see how well Microsoft's Phi-4 releases set it apart as the only competition below ~7B, it'll probably take a year for the FOSS community to implement and digest it completely (it can do multimodal! TTS! STT! Conversation!)
> it'll probably take a year for the FOSS community to implement and digest it completely
The local community seems to have converged on a few wrappers: Open WebUI (general-purpose), LM Studio (proprietary), and SillyTavern (for role-playing). Now that llama.cpp has an OpenAI-compatible server (llama-server), there's a lot more options to choose from.
I've noticed there really aren't many active FOSS wrappers these days - most of them have either been abandoned or aren't being released with the frequency we saw when OpenAI API first launched. So it would be awesome if you could share your wrapper with us at some point.
I think OP means that FOSS didn't digest many multimodals of phi4-mini-multimodal such as Audio Input (STT) and Audio Output (TTS), also Image Input also not much supported in many FOSS.
AFAIK, Phi-4-multimodal doesn't support TTS, but I understand OP's point.
The recent Qwen's release is an excellent example of model providers collaborating with the local community (which include inference engine developers and model quantizers?). It would be nice if this collaboration extended to wrapper developers as well, so that end-users can enjoy a great UX from day one of any model release.
I've been happier with LibreChat over Open WebUI. Mostly because I wasn't a fan of the `pipelines` stuff in Open WebUI and lack of MCP support (probably has changed now?). But then I don't love how LibreChat wants to push its (expensive) code runner service.
grep "As seen above, Phi-4-mini-reasoning with 3.8B parameters outperforms models of over twice its size."
re: reasoning plus, "Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy.", presumably also 14B