OpenMetal Cloud FAQ
The following FAQs cover many categories of OpenMetal Cloud. These categories consist of the platform and hardware technology, the network setup, the private cloud core, understanding of the service level agreement, and details of how to run your very first private cloud! If you have any additional questions, your account manager will assist you with your needs.
OpenStack
Ceph
We supply only data center grade SATA SSD and NVMe drives. The mean time between failures of a typical hard drive is 300,000 hours. Most recommendations and history of a selection of 3 replicas come from hard drive use cases taking into account this failure rate. Both our SATA SSDs MTBF and our NVMe’s MTBF are 2 million hours. Though failures will certainly still occur, it is roughly 6 times less likely than with an HDD.
When Ceph has been hyper-converged onto 3 servers with a replica level of 3 when you lose one of the 3 members, Ceph can not recover itself out of a degraded state until the lost member is restored or replaced. The data is not at risk since two copies remain but it is now effectively a Replica level of 2. When Ceph has been hyper-converged onto 3 servers with a replica level of 2 when you lose one of the 3 members, Ceph can be set to self-heal by taking any data that has fallen to 1 replica and automatically start the copy process to recover to a replica level of 2. Your data loss danger only occurs during the time when only 1 replica is present.
Disaster recovery processes for data have progressed significantly. This will be based on your specific situation, but if restoring data from backups to production is straightforward and fast, then in the extremely rare case of both of the 2 replicas failing in the degraded period, you will then need to recover from backups.
Usable Ceph disk space savings are significant (estimated, not exact):
- HC Small, Replica 3 – 960GB * 3 servers / 3 replicas = 960GB usable
- HC Small, Replica 2 – 960GB * 3 servers / 2 replicas = 1440GB usable
- HC Standard, Replica 3 – 3.2TB* 3 servers / 3 replicas = 3.2TB usable
- HC Standard, Replica 2 – 3.2TB * 3 servers / 2 replicas = 4.8TB usable
Bare Metal
OpenMetal Cloud Platform and Hardware
For extremely high IOPS, we recommend using the NVMe or SATA SSD drives directly from your application. This means that you will need to accomplish data integrity and high availability through your software. By doing this though, many applications like high-performance databases can function extremely well. The NVMe drives on the HC Standards and the Compute Standards, in particular, have extreme IOPS. It bears repeating though – you must handle data integrity and HA yourself.
For very high IOPS with built-in data protection, Ceph with a replication of 2 on NVMe drives is popular. A replica level of 3 will slightly reduce the IOPS but is a recommended choice.
For very large OpenMetal Clouds, like several hundred server nodes, the control plane on the PCC can use enough of the PCC’s resources that best practices will advise against using the PCC for compute and storage. We also recommend very large deployments choose the HC Standard X5 to spread the usage across 5 servers versus 3 servers for best performance. HC Standards are very powerful machines though and selected to be able to cover many different situations while supplying control plane services with compute and storage.
The use of 3 replicas has typically been the standard for storage systems like Ceph. It means that 3 copies exist at all times in normal operation to prevent data loss in the event of a failure. In Ceph’s lingo, if identical data is stored on 3 OSDs, when one of the OSDs fails, the two remaining replicas can still tolerate one of them failing without loss of data. Depending on the Ceph settings and the storage available, when Ceph detects the failed OSD, it will wait in the “degraded” state for a certain time, then begin a copy process to recover back to 3 replicas. During this wait and/or copy process, the Ceph is not in danger of data loss if another OSD fails.
Two downsides to consider. The first downside to 3 replicas is slower maximum performance as the storage system must write the data 3 times. Your applications may operate under the maximum performance though so maximum performance may not be a factor.
The second downside is cost as with 3 replicas it means that if you need to store 1GB of user data, it will consume 3GB of storage space.
With data center grade SATA SSD and NVMe drives, the mean time between failure (MTBF) is better than traditional spinning drives. Spinning drive reliability is what drove the initial 3 replica standard. Large trustworthy data sets describe a 4X to 6X MTBF advantage to SSDs over HDDs. This advantage has led to many cloud administrators moving to 2 replicas for Ceph when running on data center grade SSDs. Both our HC Smalls and HC Standards use data center grade SSDs.
Considerations for 2 replicas:
First, with two replicas, during a failure of one OSD, there is a time when a loss of a second OSD will result in data loss. This time is during the timeout to allow the first OSD to potentially rejoin the cluster and the time needed to create a new replica on a different running OSD. This risk is real but is offset by the very low chance of this occurring and the relative ease or difficulty for you to recover data from a backup.
Storage space is more economical as 1GB only consumes 2GB
Maximum IOPS may increase as Ceph only needs to write 2 copies before acknowledging the write
Latency may decrease as Ceph only needs to write 2 copies before acknowledging the write.
All servers will have at least one usable drive for data storage, including servers labeled as compute. You have the option to use this drive for LVM based storage, ephemeral storage, or as part of Ceph. Each drive is typically performing only 1 duty and that is our default recommendation*.
For Ceph, if the drive types differ – ie, SATA SSD vs NVMe SSD vs Spinners – you should not join them together within one pool. Ceph can support multiple different performance pools, but you should not mix drive types within a pool. In order to create a pool that can support replication of 2, you will need at least two servers. For a replication of 3, you will need 3 servers. For erasure coding, you typically need 4 or more separate servers.
If you are creating a large storage pool with spinners, we have advice specific to using the NVMe drives as an accelerator for the storage process and as part of the object gateway service. Please check with your account manager for more information.
*Of note, though this is not a common scenario yet, with our high-performance NVMe drives, the IO is often much, much higher than typical applications require so splitting the drive to be both parts of Ceph and as a local high-performance LVM is possible with good results.
With that being said, there are several ways to grow your compute and storage past what is within your PCC.
You can add additional matching or non-matching compute nodes. Keep in mind that during a failure scenario, you will need to rebalance the VMs from that node to nodes of a different VM capacity. Though not required, it is typical practice to keep a cloud as homogeneous as possible for management ease.
You can add additional matching converged servers to your PCC. Typically you will join the SSD with your Ceph as a new OSD, but the drive on the new node can be used as ephemeral storage or as traditional drive storage via LVM. If joined to Ceph, you will see that Ceph will automatically balance existing data onto the new capacity. For compute, once merged with the existing PCC, the new resources will become available inside OpenStack.
You can create a new converged cluster. This allows you to select servers that are different from your PCC servers. You will need to use at least 2 servers for the new Ceph pool, 3 or greater is the most typical. Of note, one Ceph can manage many different pools, but you can also have multiple Ceph clusters if you see that as necessary.
You can create a new storage cloud. This is typically done for large-scale implementations when the economy of scale favors separating compute and storage. This is also done when object storage is a focus of the storage cloud. Our blended storage and large storage have up to 12 large capacity spinners. They are available for this use and others.
Support and Service
In general, we manage the networks above your OpenMetal Clouds and we supply the hardware and parts replacements as needed for hardware in your OpenMetal Clouds.
OpenMetal Clouds themselves are managed by your team. If your team has not managed OpenStack and Ceph private clouds before, we have several options to be sure you can succeed.
- Complimentary onboarding training matched to your deployment size and our joint agreements
- Self-paced free onboarding guides
- Free test clouds, some limits apply
- Paid additional training, coaching, and live assistance
- Complimentary emergency service – please note this can be limited in the case of overuse. That being said, we are nice people and are driven to see you succeed with an open source alternative to the mega clouds.
In addition, we may maintain a free “cloud in a VM” image you can use for testing and training purposes within your cloud.
You can override the safety check from within an API or within OpenMetal Central. This is not recommended and can lead to many issues.
Also, if for any reason it is not a fit, you have a self service 30 day money back guarantee or the 30 days free PoC time. No obligations or lock-in.
We have done this before for customers in many different situations. Of course, results will differ based on the following factors:
- Are you over the “Tipping Point” in cloud spend? There are a few major factors to consider:
- Is your team highly technical? If so, running an OpenMetal Cloud is not much different, skill set wise, to what is needed to maintain health fleet of VMs, storage, and networking. It can actually improve your operational expenses as our team is much more available than any major cloud provider.
- Public Cloud is a good solution when cloud spends is around $10k/month or less. Hosted Private Cloud is a close cousin to public cloud – spin up on demand, transparent pricing – but scale is important to get the most value from the cloud.
Also, if for any reason it is not a fit, you have a self service 30 day money back guarantee or the 30 days free PoC time. No obligations or lock-in.
- An Account Manager, Account Engineer, and an Executive Sponsor will be assigned from our side
- You will be invited to our Slack for Engineer to Engineer support
- Your Account Manager will collect your goals and we will align our efforts to your success
- You can meet with your support team via Google Meets (or the video system of your choice) up to weekly to help keep the process on schedule
- Migration planning if existing workloads are being brought over
- Discussion on agreements and potential discounts via ramps if moving workloads over time
We offer two levels of support to allow companies to choose what is best for them. A third, custom level is also available.
All clouds come with the first level of support included within the base prices:
- All hardware, including servers, switches/routers, power systems, cooling systems, and racks are handled by OpenMetal. This includes drive failures, power supply failures, chassis failures, etc. Though very rarely needed, assistance with recovery of any affected cloud services due to hardware failure is also included.
- Procurement, sales taxes, fit for use as a cloud cluster member, etc. are all handled by OpenMetal.
- Provisioning of the initial, known good cloud software made up of Ceph and OpenStack.
- Providing new, known good versions of our cloud software for optional upgrades. OpenMetal may, at its discretion, assist with upgrades free of charge, but upgrades are the responsibility of the customer.
- Support to customer’s operational/systems team for cloud health issues.
The second level of support is termed Assisted Management and has a base plus hardware unit fee. In addition to the first level of support, the following is offered with Assisted Management:
- Named Account Engineer.
- Engineer to Engineer support for cloud.
- Engineer to Engineer advice on key issues or initiatives your team has for workloads on your OpenMetal Cloud.
- Upgrades are handled jointly with OpenMetal assisting with cloud software upgrades.
- Assisting with “cloud health” issues, including that we jointly monitor key health indicators and our 24/7 team will react to issues prior to or with your team.
- Monthly proactive health check and recommendations.
In addition, OpenMetal publishes extensive documentation for our clouds and many System Administration teams prefer to have that level of control.
Skilled Linux System Administrators can learn to maintain an OpenMetal Cloud in about 40 hours using our provided Cloud Administrator Guides and we will give you free time on non-production test clouds for this purpose. Most teams view this new technology as a learning opportunity for their team and as the workload shifts to your hosted private cloud, time on the old systems are reduced accordingly.
We also offer our Assisted Management level of service. This is popular for first time customers and is quite reasonable. It covers most situations, including that we jointly monitor “cloud health” and our 24/7 team will react to issues prior to or with your team.
Your OpenMetal Hosted Private Cloud is “Day 2 ready” and is relatively easy to maintain but does require a solid set of Linux System Administration basics to handle any customizations or “cloud health” work.
For companies without a Linux Admin Ops team, we recommend our Assisted Management level of service. It covers most situations, including that we jointly monitor “cloud health” and our 24/7 team will react to issues prior to or with your team.
You may also consider having your team grow into running the underlying cloud over time. A properly-architected OpenStack Cloud can have very low time commitments to maintain in a healthy state. A junior Linux System Administrator can learn to maintain an OpenMetal Cloud in about 120 hours using our provided Cloud Administrator Guides and we will give you free time on non-production test clouds for this purpose. OpenMetal is unique in offering free time for customers on completely separate on-demand OpenStack Clouds. This new learning opportunity could dramatically change both your staff and your company’s view on private cloud.
Speed and Connectivity
Disaster Recovery
Second, it is likely that you will be “SWIPing” IP addresses to us to broadcast from our routers.
It is wise to understand the processes above ahead of time and potentially perform a yearly dry run.
Have additional questions?
Explore Components of OpenMetal Cloud
Private cloud as a service is delivered on demand and at scale.
Ceph Object Storage
Enterprise data storage with scaling and fault management capabilities.