IBM extends serverless computing to GPU workloads for enterprise AI and simulation

0
8


The challenge of running simulation and high-performance workloads efficiently is a constant issue, requiring input from stakeholders including infrastructure teams, cybersecurity professionals, and, of course, ever-watchful finance officers.

Running these types of high-compute tasks often involves thousands of concurrent processes and are costly to run on traditional infrastructure. IBM’s latest update to its Cloud Code Engine – the launch of Serverless Fleets with GPU support – may reduce complexity. They combine high-performance computing with a managed, pay-as-you-go serverless model, where one point of reference is addressed by the user, and necessary deployment at scale takes place autonomously.

High-performance computing without infrastructure friction

Enterprises running large-scale AI training, risk simulations, or generative workloads are looking at two problems, commonly: limited GPU access and rising infrastructure/cloud costs. Serverless Fleets provides an alternative. Instead of maintaining dedicated GPU clusters, organisations can submit large batches of compute jobs through a single endpoint.

IBM’s system provisions GPU-backed virtual machines, executes the workload, and tapers off the resources used when complete. This approach improves utilisation and cost visibility, IBM claims, with customers only charged for active runtime.

In practice, this could help financial institutions (for example) with faster risk modelling, or let media companies render their workloads without investing in GPU farms or entering long leases. For many, it means faster innovation and reduced operational overhead.

Implementation realities

IBM suggests that Serverless Fleets can manage workloads at scale “with essentially zero SRE staff.” While ambitious, the model certainly simplifies the detail of orchestration. Code Engine can determine the number of worker instances needed and scale them to match the demanded work. This reduces the tuning typically required to balance parallel GPU tasks.

Adopting the platform, however, would need careful oversight with a keen eye on costs – ubiquitous challenges in serverless environments. Enterprises will need clear visibility into their common workload patterns, plus be aware of any compliance issues when considering effectively out-sourcing GPU-heavy jobs to a managed cloud.

Market and ecosystem context

IBM joins other hyperscalers in adapting serverless platforms for high-performance computing. AWS supports GPU-backed containers through Fargate with ECS or EKS, and Microsoft Azure offers GPU-enabled containers in its Serverless Container Apps. IBM’s Cloud Code Engine is different, the company says, supporting web apps, event-driven functions, and GPU-intensive batch jobs all managed from the one environment.

Executive takeaway

For CIOs and Cloud Directors, IBM’s Serverless Fleets represent a step toward the promised elasticity of the cloud and its ability to handle high-performance computing. The model could at least reduce entry barriers for GPU-heavy workloads, especially for teams without readily-available DevOps. However, before adopting, leaders might consider some or all of the following:

  • What are the comparative costs of on-demand GPUs vs. reserved capacity models?
  • Is governance and data security a deciding issue?
  • Are there cost-monitoring methods in place that can keep tabs on managed workloads?
  • Can example workloads be piloted to test scalability and predictability.
  • Is IBM’s offering better/cheaper/worse/more expensive than similar solutions from other hyperscalers?
  • Are workloads suitable for running in-house, and what might be the OPEX in the longer-term of that choice?

Serverless GPU computing is still evolving, but IBM’s approach offers another option for enterprises to explore large-scale AI and simulation workloads without the overhead of infrastructure considerations.

(Image source: “Buddha said he wanted to have a word with me” by Trey Ratcliff is licensed under CC BY-NC-SA 2.0.)

Want to learn more about Cloud Computing from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

CloudTech News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.