Nscale’s Serverless Inference Makes Cost-Efficient AI Accessible to All
Nscale, the hyperscaler engineered for AI, is debuting the Serverless Inference platform, enabling enterprises to instantly access generative AI (GenAI) models without needing to manage the underlying infrastructure. Nscale Serverless Inference democratizes AI to organizations of all sizes, delivering cost efficient, scalable deployment shaped by a pay-as-you-go model.
With Serverless Inference, users can immediately deploy an array of GenAI models—including Meta's Llama, Alibaba's Qwen, and DeepSeek—without dealing with infrastructural headaches. Users pay only for what they consume, reducing operational overhead spurred by idle capacity costs which inhibit enterprises’ abilities to deploy and experiment with GenAI models, according to Nscale.
When paired with the broader Nscale platform, users benefit from a variety of other capabilities, including Slurm and Kubernetes orchestration, observability, and multi-tenant security.
“Launching our Serverless Inference platform marks Nscale’s expansion into public, on-demand AI services, making AI model deployment simple and cost-effective,” said Daniel Bathurst, chief product officer at Nscale. “While our private cloud remains ideal for large-scale enterprise workloads, this new serverless option enables more developers to experiment with and scale inference workloads. With upcoming features set to include dedicated endpoints, fine-tuning capabilities and the ability to support custom model hosting, we're proud to offer sovereign, European AI infrastructure to meet rapidly growing inference demand.”
To learn more about Nscale, please visit https://www.nscale.com/.