The state of "super fast inference" is frustrating

4 points by 4k 2 days ago

I am talking about the 3 providers I know of which claim super fast inference: Groq, Cerebras and Sambanova. Every one of those claim extremely fast multi hundred tokens per second inference speeds on reasonably large models. Every one of those also have a chat demo on their website which seems to confirm their proposed numbers

However, for many months now, each of those providers have literally the same API page where only the Free option with low rates is available. Everything else is "Coming Soon". No updates, no dates, no estimates, nothing.

Come to think of it, there is not a single good inference provider in the whole open source models space that offers a paid API without throttle in over 50 tps consistently. There's money to be made here and surprisingly nobody is doing it aggressively