OpenMetal Cloud FAQ

The following FAQs cover many categories of OpenMetal Cloud. These categories consist of the platform and hardware technology, the network setup, the private cloud core, understanding of the service level agreement, and details of how to run your very first private cloud! If you have any additional questions, your account manager will assist you with your needs.

OpenStack

What is OpenStack?

OpenStack is the overarching cloud management software and handles networking, compute, storage connection, access levels, and much more. More information can be found here.

What are OpenStack security groups?

A security group acts as a virtual firewall for servers and other resources on a network. Create firewall rules on the hardware nodes to protect VMs on the individual node. This allows you to have a public IP address on a network so that individual departments can have their own private network space for their VMs separated from other departments.

OpenStack running VXLAN

In your private network within your “hard” VLANs you can also create an overlay that provides networking, management control, control panels, APIs, and more to the compute and storage.

Do I have access to the OpenStack APIs to automate deployments by using Terraform, Ansible, etc.?

Yes, this is your private cloud!

Ceph

What is Ceph?

Ceph provides the network storage including Block Storage, Object Storage, and, if needed, an NFS compatible file storage called CephFS. More information can be found here.

For Ceph data redundancy, why choose 3 replicas versus 2 replicas or vice-versa?

If you need to maximize your usable disk space, we have the following general preference for Replica 2. This choice is based on the following:

We supply only data center grade SATA SSD and NVMe drives. The mean time between failures of a typical hard drive is 300,000 hours. Most recommendations and history of a selection of 3 replicas come from hard drive use cases taking into account this failure rate. Both our SATA SSDs MTBF and our NVMe’s MTBF are 2 million hours. Though failures will certainly still occur, it is roughly 6 times less likely than with an HDD.

When Ceph has been hyper-converged onto 3 servers with a replica level of 3 when you lose one of the 3 members, Ceph can not recover itself out of a degraded state until the lost member is restored or replaced. The data is not at risk since two copies remain but it is now effectively a Replica level of 2. When Ceph has been hyper-converged onto 3 servers with a replica level of 2 when you lose one of the 3 members, Ceph can be set to self-heal by taking any data that has fallen to 1 replica and automatically start the copy process to recover to a replica level of 2. Your data loss danger only occurs during the time when only 1 replica is present.

Disaster recovery processes for data have progressed significantly. This will be based on your specific situation, but if restoring data from backups to production is straightforward and fast, then in the extremely rare case of both of the 2 replicas failing in the degraded period, you will then need to recover from backups.

Usable Ceph disk space savings are significant (estimated, not exact):

HC Small, Replica 3 – 960GB * 3 servers / 3 replicas = 960GB usable
HC Small, Replica 2 – 960GB * 3 servers / 2 replicas = 1440GB usable
HC Standard, Replica 3 – 3.2TB* 3 servers / 3 replicas = 3.2TB usable
HC Standard, Replica 2 – 3.2TB * 3 servers / 2 replicas = 4.8TB usable

Can I use the boot drive for data storage or part of a Ceph pool?

No, that is not recommended unless it is an emergency or similar temporary situation. Those drives are not intended for heavy use and are not rated for high disk writes per day.

Bare Metal

Provisioning Bare Metal Servers

When provisioning bare metal servers within your network they will be, by default, on your private VLANs. You can then use OpenStack’s Firewall as a Service to allow selected public traffic through to that bare metal server. You have the option to place any bare metal servers on the public VLAN by overriding the VLAN tagging on that individual server. This is not an automated process as placing a bare metal server on the public VLAN will result in a server without a firewall unless you manually create it on said server. In the case you are running bare-metal servers that are not part of the OpenStack cluster, then those bare-metal servers will be within the private or public VLAN you assigned and must traverse one of the private OpenStack routers to connect to a VM that is on a VXLAN. This is typical architecture as the bare metal to VM route is entirely within your private network.

OpenMetal Cloud Platform and Hardware

Is there a graphical user interface?

We offer OpenMetal Central as a GUI as well as by API. OpenStack and Ceph both have an administrative GUI and a “Self Service User” GUI. Of note, as OpenStack and Ceph are often considered to be “API first” or “Infrastructure as Code first” applications, more administrative features are available via API or Command Line than within the administrative interface. For users that you might give self-service access, OpenStack and Ceph have strong capabilities within the GUI.

How many IOPS will I get?

These servers are dedicated to you. IOPS will vary by the hardware you purchase and the technology you are using to access the hardware. The drives used are data center grade Intel NVMe or SATA SSDs. Spinning hard drives are data center grade.

For extremely high IOPS, we recommend using the NVMe or SATA SSD drives directly from your application. This means that you will need to accomplish data integrity and high availability through your software. By doing this though, many applications like high-performance databases can function extremely well. The NVMe drives on the HC Standards and the Compute Standards, in particular, have extreme IOPS. It bears repeating though – you must handle data integrity and HA yourself.

For very high IOPS with built-in data protection, Ceph with a replication of 2 on NVMe drives is popular. A replica level of 3 will slightly reduce the IOPS but is a recommended choice.

Do you offer GPUs?

We are currently researching the right hardware for bulk availability. Please contact your account manager for access to GPUs.

What is a control plane?

In OpenStack the control plane is made up of all the services that are necessary for the IPMI port to be connected to an IPMI network that only allows traffic between your port and our central management IP.

How many resources go to the control plane?

This depends on the side of the OpenMetal Cloud and services being used from the OpenStack control plane. For small OpenMetal Clouds, this might only be a few CPU cores and 2-4GB of RAM per Private Cloud Core server. Examples of small would include OpenMetal Clouds that are only made up of a 3-member Private Cloud Core.

For very large OpenMetal Clouds, like several hundred server nodes, the control plane on the PCC can use enough of the PCC’s resources that best practices will advise against using the PCC for compute and storage. We also recommend very large deployments choose the HC Standard X5 to spread the usage across 5 servers versus 3 servers for best performance. HC Standards are very powerful machines though and selected to be able to cover many different situations while supplying control plane services with compute and storage.

Why should I select a 5 server Private Cloud Core (PCC) over a 3 server PCC?

Capacity and redundancy benefits come with the 5 PCC footprint and are typically appropriate for very large deployments. Three areas to consider:

The use of 3 replicas has typically been the standard for storage systems like Ceph. It means that 3 copies exist at all times in normal operation to prevent data loss in the event of a failure. In Ceph’s lingo, if identical data is stored on 3 OSDs, when one of the OSDs fails, the two remaining replicas can still tolerate one of them failing without loss of data. Depending on the Ceph settings and the storage available, when Ceph detects the failed OSD, it will wait in the “degraded” state for a certain time, then begin a copy process to recover back to 3 replicas. During this wait and/or copy process, the Ceph is not in danger of data loss if another OSD fails.

Two downsides to consider. The first downside to 3 replicas is slower maximum performance as the storage system must write the data 3 times. Your applications may operate under the maximum performance though so maximum performance may not be a factor.

The second downside is cost as with 3 replicas it means that if you need to store 1GB of user data, it will consume 3GB of storage space.

With data center grade SATA SSD and NVMe drives, the mean time between failure (MTBF) is better than traditional spinning drives. Spinning drive reliability is what drove the initial 3 replica standard. Large trustworthy data sets describe a 4X to 6X MTBF advantage to SSDs over HDDs. This advantage has led to many cloud administrators moving to 2 replicas for Ceph when running on data center grade SSDs. Both our HC Smalls and HC Standards use data center grade SSDs.

Considerations for 2 replicas:

First, with two replicas, during a failure of one OSD, there is a time when a loss of a second OSD will result in data loss. This time is during the timeout to allow the first OSD to potentially rejoin the cluster and the time needed to create a new replica on a different running OSD. This risk is real but is offset by the very low chance of this occurring and the relative ease or difficulty for you to recover data from a backup.

Storage space is more economical as 1GB only consumes 2GB

Maximum IOPS may increase as Ceph only needs to write 2 copies before acknowledging the write

Latency may decrease as Ceph only needs to write 2 copies before acknowledging the write.

How are IP addresses handled?

We supply IPv4 for lease and will be terminated on your VLANs. We are aiming to supply IPv6 for a no-charge lease in the near release. You can also SWIP your IPv4 blocks to us.

Is there any shared hardware in our OpenMetal Cloud?

Your servers are 100% dedicated to you. The crossover between your OpenMetal Cloud and the overall data center comes at the physical switch level for internet traffic and for IPMI traffic. For internet traffic, you are assigned a set of VLANs within the physical switches. Those VLANs only terminate on your hardware. For administrative purposes, your hardware’s those departments or people. You can set resource limitations that will be enforced by OpenStack. Regardless of if the project is being managed via API or through Horizon, OpenStack will enforce your policies. As OpenStack is an API-first system, there is often more functionality available via the API than within Horizon. For Cloud Administrators, a robust CLI that uses the API is the most popular way to administer OpenStack.

What are the options to grow my compute and/or storage resources?

First, a little background on Ceph and creating Storage Pools. The following is important.

All servers will have at least one usable drive for data storage, including servers labeled as compute. You have the option to use this drive for LVM based storage, ephemeral storage, or as part of Ceph. Each drive is typically performing only 1 duty and that is our default recommendation*.

For Ceph, if the drive types differ – ie, SATA SSD vs NVMe SSD vs Spinners – you should not join them together within one pool. Ceph can support multiple different performance pools, but you should not mix drive types within a pool. In order to create a pool that can support replication of 2, you will need at least two servers. For a replication of 3, you will need 3 servers. For erasure coding, you typically need 4 or more separate servers.

If you are creating a large storage pool with spinners, we have advice specific to using the NVMe drives as an accelerator for the storage process and as part of the object gateway service. Please check with your account manager for more information.

*Of note, though this is not a common scenario yet, with our high-performance NVMe drives, the IO is often much, much higher than typical applications require so splitting the drive to be both parts of Ceph and as a local high-performance LVM is possible with good results.

With that being said, there are several ways to grow your compute and storage past what is within your PCC.

You can add additional matching or non-matching compute nodes. Keep in mind that during a failure scenario, you will need to rebalance the VMs from that node to nodes of a different VM capacity. Though not required, it is typical practice to keep a cloud as homogeneous as possible for management ease.

You can add additional matching converged servers to your PCC. Typically you will join the SSD with your Ceph as a new OSD, but the drive on the new node can be used as ephemeral storage or as traditional drive storage via LVM. If joined to Ceph, you will see that Ceph will automatically balance existing data onto the new capacity. For compute, once merged with the existing PCC, the new resources will become available inside OpenStack.

You can create a new converged cluster. This allows you to select servers that are different from your PCC servers. You will need to use at least 2 servers for the new Ceph pool, 3 or greater is the most typical. Of note, one Ceph can manage many different pools, but you can also have multiple Ceph clusters if you see that as necessary.

You can create a new storage cloud. This is typically done for large-scale implementations when the economy of scale favors separating compute and storage. This is also done when object storage is a focus of the storage cloud. Our blended storage and large storage have up to 12 large capacity spinners. They are available for this use and others.

When should I consider moving from converged to stand-alone compute and storage?

This depends on your use case. It typically happens naturally as you scale up. You will find that you have some “marooned” resources in your cluster. For example, marooned could mean you have disk space left over but your RAM/CPU has been consumed. In this scenario, you will just need to add compute. If you are out of storage but have plenty of RAM/CPU, you have two options. You have the choice of creating a new storage only cluster or shifting your base converged node to be a 2+ drive converged node. In the second situation, you will typically need to move to 3+ of these servers to accommodate Ceph’s pool rules and then, potentially, retire a few of the single drive nodes. Consult with your account manager for advice.

When should I add more servers?

We generally recommend that clouds are not run much over 80% of their theoretical capacity. Performance monitoring and node health are key to track.

Are there license costs by VM or other resources?

OpenMetal uses open source technology for all major systems. There are no license fees for any OpenStack feature supplied in the standard cloud system or for any features supplied by additional OpenStack components. We do include access to Datadog for hardware node monitoring included in the cost of the cloud. If you elect to use Datadog for something other than the included hardware node monitoring, you would need to pay Datadog directly for this.

Support and Service

What are we responsible for versus what you are responsible for?

As a customer-centric business, we want to provide paths to help you succeed. If you are finding a barrier to your success within our system, please escalate your contact within OpenMetal Central. There we provide direct contact to our product manager, our support manager, and to our company president.

In general, we manage the networks above your OpenMetal Clouds and we supply the hardware and parts replacements as needed for hardware in your OpenMetal Clouds.

OpenMetal Clouds themselves are managed by your team. If your team has not managed OpenStack and Ceph private clouds before, we have several options to be sure you can succeed.

Complimentary onboarding training matched to your deployment size and our joint agreements
Self-paced free onboarding guides
Free test clouds, some limits apply
Paid additional training, coaching, and live assistance
Complimentary emergency service – please note this can be limited in the case of overuse. That being said, we are nice people and are driven to see you succeed with an open source alternative to the mega clouds.

In addition, we may maintain a free “cloud in a VM” image you can use for testing and training purposes within your cloud.

I need help in my private cloud, how do I let you in?

You will need to add one of our public keys to the server in question. These keys are rotated periodically. You should remove our public keys after service has been rendered. In order to access the public key, log into your OpenMetal Central account. Click on “Requests” on the left side panel. Then click the button labeled “Support Agent Access” on the top right. You can copy the public key from the pop-up notification.

How do I give self-service access to different departments or people within my company?

Self-service access to VMs, networking space, storage, and other OpenStack services are handled through the Horizon interface or through automation against OpenStack APIs. As the cloud administrator, you will set up projects for those departments or people. You can set resource limitations that will be enforced by OpenStack. Regardless of if the project is being managed via API or through Horizon, OpenStack will enforce your policies. As OpenStack is an API-first system, there is often more functionality available via the API than within Horizon. For Cloud Administrators, a robust CLI that uses the API is the most popular way to administer OpenStack.

Will you help with Linux questions? Or individual services running on one of my VMs?

Probably not unless you are in a paid training program. If you are not sure or would like clarification on your unique situation, ask your account manager or contact us.

How do I return servers?

You can return a server by simply removing all running cloud services then requesting removal via API or from OpenMetal Central. To safely remove the server: You should spin down or move off any VMs. You should direct your OpenStack to drop management of this server. You should detach Ceph from using any drives on this server.

You can override the safety check from within an API or within OpenMetal Central. This is not recommended and can lead to many issues.

Can my workload or application run on OpenMetal?

Yes. We have yet to find a workload that cannot be run on either our OpenStack Cloud or our Bare Metal. If you can dream it, we can help you run it!

Also, if for any reason it is not a fit, you have a self service 30 day money back guarantee or the 30 days free PoC time. No obligations or lock-in.

We are trying to get away from a huge public cloud bill. How do I know you will save me 30-50%?

We have done this before for customers in many different situations. Of course, results will differ based on the following factors:

Are you over the “Tipping Point” in cloud spend? There are a few major factors to consider:
- Is your team highly technical? If so, running an OpenMetal Cloud is not much different, skill set wise, to what is needed to maintain health fleet of VMs, storage, and networking. It can actually improve your operational expenses as our team is much more available than any major cloud provider.
- Public Cloud is a good solution when cloud spends is around $10k/month or less. Hosted Private Cloud is a close cousin to public cloud – spin up on demand, transparent pricing – but scale is important to get the most value from the cloud.

Also, if for any reason it is not a fit, you have a self service 30 day money back guarantee or the 30 days free PoC time. No obligations or lock-in.

What is the onboarding process like?

It is slightly different for the self service vs the PoC process but you can count on the following:

An Account Manager, Account Engineer, and an Executive Sponsor will be assigned from our side
You will be invited to our Slack for Engineer to Engineer support
Your Account Manager will collect your goals and we will align our efforts to your success
You can meet with your support team via Google Meets (or the video system of your choice) up to weekly to help keep the process on schedule
Migration planning if existing workloads are being brought over
Discussion on agreements and potential discounts via ramps if moving workloads over time

Who manages the underlying cloud? What is the scope of support?

We offer two levels of support to allow companies to choose what is best for them. A third, custom level is also available.

All clouds come with the first level of support included within the base prices:

All hardware, including servers, switches/routers, power systems, cooling systems, and racks are handled by OpenMetal. This includes drive failures, power supply failures, chassis failures, etc. Though very rarely needed, assistance with recovery of any affected cloud services due to hardware failure is also included.
Procurement, sales taxes, fit for use as a cloud cluster member, etc. are all handled by OpenMetal.
Provisioning of the initial, known good cloud software made up of Ceph and OpenStack.
Providing new, known good versions of our cloud software for optional upgrades. OpenMetal may, at its discretion, assist with upgrades free of charge, but upgrades are the responsibility of the customer.
Support to customer’s operational/systems team for cloud health issues.

The second level of support is termed Assisted Management and has a base plus hardware unit fee. In addition to the first level of support, the following is offered with Assisted Management:

Named Account Engineer.
Engineer to Engineer support for cloud.
Engineer to Engineer advice on key issues or initiatives your team has for workloads on your OpenMetal Cloud.
Upgrades are handled jointly with OpenMetal assisting with cloud software upgrades.
Assisting with “cloud health” issues, including that we jointly monitor key health indicators and our 24/7 team will react to issues prior to or with your team.
Monthly proactive health check and recommendations.

In addition, OpenMetal publishes extensive documentation for our clouds and many System Administration teams prefer to have that level of control.

We have an Ops Team with Linux System Admins. Can we run the cloud ourselves?

Building an OpenStack and Ceph cloud is much harder than running a well architected cloud. Your OpenMetal Hosted Private Cloud is “Day 2 ready” and is relatively easy to maintain. In fact, this technology has a very high resource to Admin ratio and can increase job satisfaction for Admins as clouds are fundamentally resilient to hardware issues and can lead to less emergency responses.

Skilled Linux System Administrators can learn to maintain an OpenMetal Cloud in about 40 hours using our provided Cloud Administrator Guides and we will give you free time on non-production test clouds for this purpose. Most teams view this new technology as a learning opportunity for their team and as the workload shifts to your hosted private cloud, time on the old systems are reduced accordingly.

We also offer our Assisted Management level of service. This is popular for first time customers and is quite reasonable. It covers most situations, including that we jointly monitor “cloud health” and our 24/7 team will react to issues prior to or with your team.

We do not have an Ops Team with Linux System Admins. What do you recommend?

Your OpenMetal Hosted Private Cloud is “Day 2 ready” and is relatively easy to maintain but does require a solid set of Linux System Administration basics to handle any customizations or “cloud health” work.

For companies without a Linux Admin Ops team, we recommend our Assisted Management level of service. It covers most situations, including that we jointly monitor “cloud health” and our 24/7 team will react to issues prior to or with your team.

You may also consider having your team grow into running the underlying cloud over time. A properly-architected OpenStack Cloud can have very low time commitments to maintain in a healthy state. A junior Linux System Administrator can learn to maintain an OpenMetal Cloud in about 120 hours using our provided Cloud Administrator Guides and we will give you free time on non-production test clouds for this purpose. OpenMetal is unique in offering free time for customers on completely separate on-demand OpenStack Clouds. This new learning opportunity could dramatically change both your staff and your company’s view on private cloud.

Speed and Connectivity

What is the server to switch port speeds?

HC Smalls have 2X1gbit ports. All other servers have 2X10gbit ports. They are bonded by default to provide redundancy and greater throughput.

What is your overall connectivity?

OpenMetal Clouds are organized by “pods”. Each pod has a minimum of 200gbits of connectivity that can be upgraded based on usage. Pods/overall network may also have direct peering with other cloud providers for maximum throughput.

What is the network uptime SLA?

The network performance for 2023 was 99.994% and for 2024 it is tracking similar. The base SLA is 99.96%.

Where are your data centers? What certifications do you have?

Ashburn, Virginia, East Coast USA. Los Angeles, California, West Coast USA. Amsterdam, Netherlands, Central Europe. International Business Park, Singapore, Asia Pacific. Need another location? We are expanding! See our Data Centers. Data center certifications include: SOC 1, SOC 2, SOC 3, PCI-DSS, NIST 800-53/FISMA, HIPAA, ISO 27001, ISO 22301, ISO 50001, LEED Gold.

Disaster Recovery

I am using an OpenMetal Cloud as part of my disaster recovery plan. What do you need to know to be prepared in case we fail over to our OpenMetal Cloud?

Several things come into play here. First, your default available expansion capacity will likely be less than what you need. For example, you keep a 10 node storage deployment with us containing 1+ Petabyte of data. Your current deployment using said data is 100 server nodes. Your DR plan would mean you would need to spin up 100 compute nodes to get back to running order. You will need to work with your account manager ahead of time to have that capacity available to you. Very large deployments do require an agreement for this service if it increases our standby server quota.

Second, it is likely that you will be “SWIPing” IP addresses to us to broadcast from our routers.

It is wise to understand the processes above ahead of time and potentially perform a yearly dry run.

What do you recommend for disaster recovery replication or backups?

This will depend on your situation, but Ceph has native remote replication options. Use of more than one of our locations can often meet your DR requirements. There are also several companies that specialize in Ceph data replication if your rules require a third party. Please contact your account manager. For backups in general, the Ceph object storage system is one of the best in the industry and that is native to any of your OpenMetal storage clouds.

Have additional questions?