Will OpenMetal OpenStack Offer Serverless or Cloud-Hypervisor?

Resources » Blog » Will OpenMetal OpenStack Offer Serverless or Cloud-Hypervisor vs Unikernel vs Firecracker

We get questions on “Serverless” occasionally when helping plan moves from traditional public clouds like AWS and GCP.

A bit of background first before I try to answer this. Please note, this technology is both rapidly changing and a bit of guesswork to get the actual technological implementations that might be in place at the traditional public clouds. This article is my understanding at a certain time (June 2023).

Background:

Trying to get onto your own Cloud?

OpenMetal offers OpenStack and Ceph at a root level where customers have their own Cloud that is equivalent to core AWS and GCP offerings like Compute, Block Storage, Private Networking, Load Balancing, and more. See the Features.
Serverless is a marketing term – and in most implementations, underneath the serverless functions are Containers that are running in Virtual Machines that are running on Physical Servers. So Serverless actually has 3 servers underneath.
VMs, typically KVM with Qemu, like OpenMetal’s Compute, are VM to physical server. 2 Servers.

Serverless Infrastructure, aka the servers behind Serverless

Serverless is a marketing term that describes a fairly unique solution and problem for the traditional public clouds.

From the buyers perspective, “Serverless” means that you just have to worry about the Runtime that is specific to your coding language.

For example, if you are running an app built in Node.js 18 then when splitting it up into Functions on AWS’s Lambda then you must use the Lambda Runtime Environment specific to Node.js 18.

The actual Linux and Node stack underneath is maintained by AWS. Once your chosen version, Node.js 18 in this case, goes EOL then you must rewrite (hopefully small) in the newer version and place your code on the new Runtime, Node.js 20 in this example.

You are charged by the millisecond when your code is being run.

Pretty cool. But…

In order to accomplish running your bit of code safely and predictably, behind the scenes the following is what is happening. Using AWS as an example:

A special type of of KVM Virtual Machine, built on a Virtual Machine Manager (VMM) technology called Firecracker, is booted up.
The Node.js 18 Container is likely a snapshot in the VM so everything is running already once the VM/Container combo can register itself into RAM. But maybe there is some Container separate work that is being done.
The network accepts the inbound connection, runs your bit of code, and returns the result/saves some data to a DB/etc.
The Container and VM combo hang out for a while to see if there are more calls coming in
No other requests so it shuts down based on AWS’s purge/prune business logic

As you might imagine, that is not very efficient and comes with a new set of challenges both for you, the user, and for the provider. Provider challenges:

How to minimize this so called “Cold Start” part of the call where you can’t bill the user? If you are billing by the millisecond how long can you wait for a traditional VM to boot and then put a Container into it? Firecracker is part of the solution. It is a lightweight Virtual Machine Manager designed to remove anything not used by a workload like this. It was originally forked from a Rust project put out by Google for the Chromium OS – crosvm. Later, Firecracker was also forked by Intel for Intel’s version of Firecracker, Cloud-Hypervisor, which is now part of the Linux Foundation. At this time, my understanding is they both can boot in somewhere between 100 and 150 milliseconds. This is fast for sure – but code is often measure in nanoseconds so anything like a built in delay of 100 milliseconds isn’t trivial to deal with.

User challenges:

If your usage of the Serverless Functions are busy enough then it seems like your VM/Containers will always be running accepting calls. Occasionally they will get updated with the patched VM/Container. Likely that will be invisible to you as there is a load balancer that will make sure a new patched Runtime is booted up prior to the older ones being shut down. This is the same as any well built cloud native Docker/Kubernetes deployment though.
If your usage is not busy, you will need to deal with the Cold Start problem.

For OpenMetal, we are a cloud provider that supplies private cloud at the speed of public cloud and with dynamic scaling. See our Fundamental Advantage post if you are curious why a private first approach. As we are creating clouds for customers where they may use it for many purposes – as their own hosted private cloud, to be a public cloud provider themselves, to run a SaaS company, and, important here, potentially to run workloads that need similar rapid booting VMs.

Now, we are not sure if there is a lot of demand to have this inside of OpenStack. We provide high power bare metal servers that can be used today by customers to use Cloud Hypervisor or Firecracker already. Our debate is if we should explore adding this alternative Hypervisor to OpenStack. Some basic arguments:

Private cloud is so much more efficient per dollar to resource available that running VMs at low utilization is fine and fairly low on the priority list. Why bother with rapid stopping and starting VM technology?
This technology is pretty hard so if we figured it out for private cloud here would business models pop up around it?
? More coming soon, just ran out of time.

Any insight appreciated, reach out to me on LinkedIn or just add a comment below. I will keep working on this post as we take in feedback and any offers of partnering with other OpenStack companies. Thanks!