Serving 5 Trillion AI Tokens a Week: Inside DeepInfra with Nikola Borisov
Failed to add items
Add to basket failed.
Add to wishlist failed.
Remove from wishlist failed.
Adding to library failed
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
DeepInfra (https://deepinfra.com/) is serving over 5 trillion tokens a week and just closed a major raise backed by NVIDIA. In this episode of Pale Blue Nexus, host Yohann Calpu sits down with co-founder Nikola Borisov to unpack how DeepInfra is challenging the hyperscalers on price, why he compares inference optimization to Formula 1, and what the future of open source AI looks like.
Nikola shares the technical playbook behind DeepInfra's aggressive pricing (including the famous Mixtral moment), how quantization and KV caching drive efficiency, the company's deepening partnership with NVIDIA on the Dynamo project, and why he believes the real demand for AI infrastructure is just getting started.
We also get into the harder questions: prompt injection risks, the NemoClaw security model, hardware depreciation cycles, and what's actually overrated in today's AI infrastructure boom.
Nikola bet early on open source inference. What's your bet? The Aloomii Playbook is the operator's manual for putting AI into relationship-driven businesses — not theory, not hype, just the system we use to run our own agent fleet (OpenClaw) and our clients'.
Three editions for where you actually sit:
Founder. Solopreneur. Operator Leader.
→ Get the Playbook on Gumroad — $179 https://www.aloomii.com/playbook/ https://aloomii.gumroad.com/
The Last 20% is the LinkedIn newsletter for operators who'd rather build the system than read about it, field notes, frameworks, and behind-the-scenes from running Aloomii's agent fleet (OpenClaw) and our clients' AI transformations. No theory. No hype. Just what's actually working. →
Subscribe free on LinkedIn / the-last-20-7445126674708451328