If your application is pricing sensitive, check out DeepInfra.com - they have a variety of models in the pennies-per-mil range. Not quite as fast as Mercury, Groq or Samba Nova though.
(I have no affiliation with this company aside from being a happy customer the last few years)
DeepInfra is amazing in terms of price, like really, they have the Qwen3 embedding model for $0.002 per mn tokens. That's an order of magnitude cheaper than most alternatives with better benchmark scores. But the performance P99 is slow and the variance is huge. For latency sensitive workloads it's problematic, if they can fix that it'll be a no-brainer to use them. DeepInfra does tend to have the lowest prices of any API provider.
(I have no affiliation with this company aside from being a happy customer the last few years)