An OpenMetal GPU cluster is a set of dedicated, single-tenant GPU servers on a private high-speed network, built to order for distributed AI training and large-scale inference. Clusters can be all-RP6000 (cost-efficient training and inference), all-H200 (largest-memory, bandwidth-bound work), or mixed to match different workloads to the right GPU. Every node is bare metal with full root access, fixed monthly pricing, and included egress — no per-GPU-hour metering across the fleet.
Key Takeaways
- Single-tenant, dedicated — every GPU in the cluster is yours, with full PCIe bandwidth and no hypervisor overhead or shared-tenancy contention.
- Same or mixed GPUs — build an all-RP6000 inference fleet, an all-H200 training cluster, or a mixed cluster that routes big-model and bandwidth-bound jobs to H200 nodes and cost-efficient inference/training to RP6000 nodes.
- Private 40 Gbps mesh — nodes connect over LACP-bonded private networking for distributed training, parameter exchange, and pulling datasets from OpenMetal storage nodes; east-west traffic is not metered.
- Fixed monthly pricing across the fleet — no per-GPU-hour “idle silicon tax.” Sustained, high-utilization training runs the whole cluster hot at no incremental cost.
- Built to order — cluster size, GPU mix, RAM (up to 2TB/node), and storage are configured to the workload.
Ready to Build a GPU Cluster?
Tell us your cluster size, GPU mix, and workload, and we’ll design a dedicated GPU cluster — all-RP6000, all-H200, or mixed.
Cluster Composition
| Option | GPUs | Best for |
|---|---|---|
| All-RP6000 cluster | NVIDIA RTX Pro 6000 (96GB GDDR7, FP4) | Cost-efficient distributed training, fine-tuning, and high-throughput inference serving fleets |
| All-H200 cluster | NVIDIA H200 NVL (141GB HBM3e) | Largest models and bandwidth-bound large-scale training/inference |
| Mixed cluster | RP6000 + H200 | Route big-model/bandwidth work to H200 nodes, cost-efficient inference/training to RP6000 nodes |
OpenMetal GPU cluster component architecture
Each node is a single-tenant bare metal server with dual Intel Xeon 6530P (64C/128T), 1TB DDR5-6400 (upgradeable to 2TB), and a 6.4TB Micron 7500 MAX NVMe data drive. See the RP6000 and H200 spec pages.
Cluster Networking
Cluster nodes connect over a private 40 Gbps mesh (4x 10 Gbps LACP-bonded per node), carrying distributed-training gradients, parameter exchange, and dataset traffic from OpenMetal storage nodes. East-west traffic between your nodes is not metered.
How cluster memory works: GPU-memory pooling is available between GPUs within the same server (1–2 cards). Across nodes, GPUs communicate over the private 40 Gbps mesh rather than a shared GPU-memory fabric — so distributed jobs use data and pipeline parallelism across nodes, and workloads needing tight GPU-to-GPU memory coupling should be sized to the per-server GPU configuration.
GPU Sharing and Partitioning
For multi-tenant or multi-workload sharing within a node, NVIDIA offers MIG (Multi-Instance GPU) partitioning and time-slicing — each with different isolation and utilization trade-offs. OpenMetal covers the distinction in detail in MIG vs time-slicing GPU sharing.
Recommended Cluster Workloads
Distributed Training
Multi-node training with PyTorch FSDP, DeepSpeed, or Megatron across an all-H200 or mixed cluster, with the 40 Gbps private mesh carrying gradient and parameter traffic. Fixed-cost pricing makes long, high-utilization training runs predictable.
Large-Scale Inference Serving
Horizontally scaled inference fleets (NVIDIA NIM, vLLM, TensorRT-LLM) across RP6000 nodes for cost-efficient, high-throughput serving with consistent per-node performance.
Mixed Training + Serving Platforms
One cluster that trains on H200 nodes and serves on RP6000 nodes, keeping both on the same private network and billing model.
HPC and Scientific Computing
Multi-node MPI workloads (computational chemistry, CFD, genomics) using the dual-Xeon hosts and private mesh.
Security and Compliance
Every cluster node is single-tenant bare metal — physical isolation, no shared hypervisor on the accelerator. Intel SGX is available on the host CPUs. Intel TDX and GPU passthrough cannot be combined in a single trust boundary. Clusters are deployed in the HIPAA-compliant Ashburn (NTT DATA VA1) facility; OpenMetal offers BAAs at the organizational level.
Ready to Build a GPU Cluster?
Tell us your cluster size, GPU mix, and workload, and we’ll design a dedicated GPU cluster — all-RP6000, all-H200, or mixed.
Deployment Options
- Dedicated GPU cluster — multiple GPU nodes on a private 40 Gbps mesh, built to order.
- Attached to existing infrastructure — extend an existing OpenMetal Hosted Private Cloud or bare metal footprint with GPU nodes. See adding GPU servers to an existing OpenMetal deployment.
Where to deploy
GPU clusters are available now in Ashburn, Virginia (US-East), with advance reservations for Los Angeles, Amsterdam, and Singapore. Proof of Concept clusters are available for testing; ramp pricing is available for migrations.
Get a GPU Cluster Quote
Ready to deploy? Tell us about your distributed training or large-scale inference workload and we’ll design a dedicated GPU cluster.
- All-RP6000: cost-efficient training and high-throughput inference fleets
- All-H200: largest-memory, bandwidth-bound training and inference
- Mixed: route workloads to the right GPU on one private mesh
All deployments include fixed monthly pricing, included egress, a 99.96%+ network SLA, and DDoS protection.
Related OpenMetal Answers
- What is an OpenMetal GPU cluster?
- Can I mix RP6000 and H200 GPUs in one cluster?
- How do OpenMetal GPU cluster nodes connect to each other?
- Can OpenMetal GPU clusters pool GPU memory across nodes?
- What is the difference between MIG and time-slicing for GPU sharing?
Product specifications, pricing, and availability may change due to market conditions and other factors. For the most current information, please contact the OpenMetal team directly.



































