Prompt caching would lower the cost, later similar tech would lower the inferenc... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Kuinox 3 months ago | parent | context | favorite | on: Has LLM killed traditional NLP?

Prompt caching would lower the cost, later similar tech would lower the inference cost too. You have less than 25 tokens, thats between 1-5$.

There may be some use case but I'm not convinced with the one you gave.

minimaxir 3 months ago [–]

So there's a bit of an issue with prompt caching implementations: for both OpenAI API and Claude's API, you need a minimum of 1024 tokens to build the cache for whatever reason. For simple problems, that can be hard to hit and may require padding the system prompt a bit.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact