Venice Amidst the Inference Shift
Venice's success over the past few months looks easy to make sense of.
The platform sits at the intersection of privacy and AI, two of the largest narratives in crypto right now. As we know, though, narratives in crypto follow those in the "real world," and there's another tailwind worth flagging: AI's broader shift from training to inference.
One of the biggest events this week was Cerebras' IPO, which landed Thursday 20x oversubscribed and at a price nearly double Wednesday's final markup. What drove demand was the chance to own a piece of a company whose breakthrough chip design enables incredibly fast inference, an essential commodity as AI companies deal with overwhelming demand and agentic expansion.
Take the last few months for Anthropic. Reports of Claude seeming recently lobotomized are all over the internet. Anthropic itself shared that usage grew 80x more than expected, then partnered with SpaceXAI to use the entire compute capacity at the Colossus 1 data center: a system equipped with 220,000+ Nvidia GPUs and over 300 megawatts of power, enough electricity to run a midsize city.
Elon spent a week with Anthropic's team running his "evil detector." They passed. Now he's their landlord.
— Josh Kale (@JoshKale) May 6, 2026
Anthropic just locked up ALL of the compute at SpaceX's Colossus 1:
→ 300+ megawatts of capacity
→ 220,000 NVIDIA GPUs
→ Online within the month
Anthropic's full… https://t.co/s5L6MptwIN pic.twitter.com/5NNgIWZTPl
Now, all of a sudden, Claude's usage limits jump 50%, and come June, users will be able to claim free API credits. All this shows Anthropic, and AI companies at large, being squeezed on their ability to serve inference, maybe more so than training. Models keep breaking new ground on benchmarks, and SpaceXAI was able to consolidate all of its training onto Colossus 2.
Claude Code weekly limits are increasing 50%, now through July 13.
— ClaudeDevs (@ClaudeDevs) May 13, 2026
Live now for all Pro, Max, Team, and seat-based Enterprise users. pic.twitter.com/5nU0XX4RZY
You could think of it like this: training is a one-time cost. Inference is recurring and scales with usage. As Stratechery's Ben Thompson put it this week, once machines run tasks on dictates from other machines, inference demand stops scaling with user count and starts scaling with compute itself. Dynamics like these are why JP Morgan sizes the inference market at 10 to 50 times the size of training.
Here's where Venice's ecosystem fits in. Venice runs on two closely linked tokens. VVV is the primary Venice token, used for staking, yield, and access to Venice Pro, Venice's inference product. DIEM is the secondary token, minted by locking staked VVV (sVVV) and burned to unlock it. Each staked DIEM gives the holder $1/day of Venice API credit, turning inference into a tradable onchain resource. Projects are already stockpiling DIEM to service inference for their own platforms, agents, and users.

The two tokens are structurally tied but trade independently: VVV is the upstream capital asset, DIEM the downstream compute asset. If inference demand grows, DIEM offers direct exposure to API capacity, and VVV benefits as the source asset required to create it, as well as a way to access Venice's platform overall.
A series of other tokens caught bids this week as beta to Venice, some with a more legitimate claim to that status than others. One of the better ones is Dolphin, the team behind the default Venice Uncensored model, and its token POD. The value prop of an uncensored, private model is easy to understand.
Venice Uncensored 1.2 is now live.
— Venice (@AskVenice) April 16, 2026
Developed with @dphnAI, this model delivers the most uncensored version of Mistral 24B.
Upgraded with vision support, a 4x larger context window, and stronger tool-use capabilities.
Trained on Bittensor Subnet 4 @TargonCompute. pic.twitter.com/UxQOiCIemB
Beyond serving as a vehicle to speculate on that popularity, POD stands on its own as an inference asset: Dolphin runs a distributed inference network on idle consumer GPUs, accepts POD as a payment option for inference, routes 100% of network revenue into buying back POD on the open market, and gives stakers (xPOD) daily inference allocations across every model the network runs. Once again, we have a token tied to perpetual inference access, a use case I expect to grow.
— Dolphin (@dphnAI) May 11, 2026
Overall, inference is the broader tailwind, and Venice's ecosystem looks well-positioned to benefit, though I expect we'll see a tranche of others emerge. Lucas Tachyen over at Galaxy will be writing more on decentralized inference, so worth keeping an eye out there. This reads like a trend worth tracking.
Inference is the market's new favorite word https://t.co/WNYbl6bbrr
— David Christopher (@davewardonline) May 15, 2026