Hacker News new | past | comments | ask | show | jobs | submit login

I've been working on a self-hosted, low-latency service for small LLM's. It's basically exactly what I would have wanted when I started my previous startup. The goal is for real time applications, where even the network time to access a fast LLM like groq is an issue.

I haven't benchmarked it yet but I'd be happy to hear opinions on it. It's written in C++ (specifically not python), and is designed to be a self-contained microservice based around llama.cpp.

https://github.com/thansen0/fastllm.cpp




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: