Probably by usage similar to OpenAI since I assume your costs are correlated (compute etc).
You could do a hobby plan which is free up to a certain no. of requests/tokens per day or hour so developers can start building without any friction (I think this is important and sort of expected as most beloved dev tools do it). You can minimize your costs by offering this on shared resources so inference time is a little slow and API can go down at times when there is high usage but its free so users wont mind.
And then have pro plans for higher usage / 99.9999 SLA / fast inference etc (maybe a min subscription with a pay if you go over per 100k tokens option...similar to Vercel with bandwidth and serverless functions).
Would love to use an API, per token pricing is a good approach (with use limits like OpenAI). If you need testers, I have some use case (long form non-fiction content). LMK at ml[at]summarity[dot]com
You could do a hobby plan which is free up to a certain no. of requests/tokens per day or hour so developers can start building without any friction (I think this is important and sort of expected as most beloved dev tools do it). You can minimize your costs by offering this on shared resources so inference time is a little slow and API can go down at times when there is high usage but its free so users wont mind.
And then have pro plans for higher usage / 99.9999 SLA / fast inference etc (maybe a min subscription with a pay if you go over per 100k tokens option...similar to Vercel with bandwidth and serverless functions).