GPU Servers and Clusters Pricing

Private GPU Servers and Clusters

Dedicated NVIDIA GPU acceleration on the OpenMetal v5 platform. Built for AI training, inference, and HPC, with transparent monthly pricing and no metered hours.

Fully dedicated, single-tenant GPUs with no shared hypervisor
Same Intel Xeon 6 and DDR5-6400 foundation as the rest of the v5 catalog
Delivered as single-tenant bare metal for full control
Deploy as standalone servers or interconnected multi-node clusters
Predictable monthly billing with fair, transparent egress

Choose your GPU platform

Two dedicated GPU server lines on the same modern v5 foundation. One or two GPUs per server, scalable into multi-node clusters.

RP6000

Best for: inference, fine-tuning, rendering, and mixed AI and visualization pipelines. High VRAM at favorable cost/GB.

1 or 2x NVIDIA RTX PRO 6000

GPU memory: 96 GB GDDR7 per GPU
GPU memory bandwidth: 1.6 TB/s per GPU
CUDA cores: 24,064 per GPU
CPU: 2x Intel Xeon 6530P (64C / 128T)
Memory: 1 TB DDR5-6400 (to 2 TB)
Storage: 1x 6.4TB included, up to 8 NVMe bays (Micron 7500 MAX)
Private network: 20 Gbps standard, upgradeable to 40 Gbps
Public network: 10 Gbps

Contact Sales

H200

Best for: large-model training and memory-bound inference where bandwidth and capacity are the bottleneck.

1 or 2x NVIDIA H200 NVL (PCIe)

GPU memory: 141 GB HBM3e per GPU
GPU memory bandwidth: 4.8 TB/s per GPU
CPU: 2x Intel Xeon 6530P (64C / 128T)
Memory: 1 TB DDR5-6400 (to 2 TB)
Storage: 1x 6.4TB included, up to 8 NVMe bays (Micron 7500 MAX)
Private network: 20 Gbps standard, upgradeable to 40 Gbps
Public network: 10 Gbps

Contact Sales

Pricing, features, and availability are subject to change without notice. For GPU servers, all final prices need to be confirmed with the OpenMetal sales team. However, unlike many providers, OpenMetal honors written quotes for 30 days from the date issued. Because market conditions and hardware costs can fluctuate, any new or revised quotes will reflect current market pricing.

With v5 we modernized the foundation of our bare metal and private cloud catalog. Adding the RP6000 and H200 was the natural next step. Customers running AI and HPC workloads get fully dedicated GPUs on the same modern Xeon 6000 platform, with transparent monthly billing and infrastructure they actually control, not throttled, metered slices of someone else’s cluster.

Jamie Tischart, CTO of OpenMetal

How is Private AI on OpenMetal Infrastructure Different?

It’s private, customizable, and our engineers are on your team.

Fully dedicated

Single-tenant bare metal with direct access to the GPU, CPU, memory, and storage. Nothing is virtualized or shared, so performance stays consistent and the hardware is entirely yours.

Built to order

The listed configurations are a starting point. Work with our team to design the deployment your workload needs, and we handle ordering, setup, and reliable operation.

Engineers on your team

Real infrastructure engineers help you size, deploy, and tune. For organizations in healthcare, finance, research, and SaaS that need data locality and compliance control, that support matters.

From a single server to a multi-node cluster

Deploy one GPU server or interconnect many over the v5 private network. Common AI and ML frameworks are supported out of the box.

Single server

One or two dedicated GPUs per server on the full v5 platform. Ideal for inference, fine-tuning, and focused training runs.

Multi-node clusters

Interconnect multiple GPU servers over a 20 Gbps private network to build training and inference clusters sized to your workload.

Bare metal, no layers

Every server is delivered as single-tenant bare metal, with direct access to the GPU, CPU, memory, and storage. No hypervisor sits between your workload and the hardware.

Deploying AI on OpenMetal

These fit guides cover the most common inference models and fine tuning use cases, so you can determine how well they run on OpenMetal’s GPU and CPU catalog.

INFERENCE

A reference for matching open-weight models to the H200, RP6000, and XL v5 CPU. GPU fit is bound by a model’s total size in card memory; CPU throughput is bound by its active size and memory bandwidth.

Inference Fit Guide

TRAINING / FINE-TUNING

A reference for sizing fine-tuning runs on the H200 and RP6000. The footprint is set by your method: full fine-tuning holds weights plus gradients plus optimizer state, LoRA holds the frozen model plus small adapters, and QLoRA holds a 4-bit frozen model plus adapters.

Fine-Tuning Fit Guide

The inference track is about running finished models; the training track is about fine-tuning a model on your own data. Each track has an explainer (the concepts), a fit guide (the reference tables), and a use-case guide (real workloads). Check out all the guides here.

Contact Us for GPU Servers Pricing and Availability

Fill out the form below to connect with our team to discuss your requirements, delivery timelines, capabilities, and agreement pricing. Or email us at sales@openmetal.io.

FAQs

What GPU servers are available?

Two lines, both on the v5 platform. The RP6000 pairs dual Intel Xeon 6530P processors with one or two NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs (96 GB GDDR7 each), suited for inference, fine-tuning, rendering, and mixed AI and visualization workloads. The H200 uses one or two NVIDIA H200 NVL GPUs (141 GB HBM3e each) ; NVIDIA AI Enterprise is available (contact OpenMetal), targeting large-model training and memory-bound inference. Both are available now in Ashburn, VA.

What’s the difference between the RP6000 and H200?

The RP6000 is built for high-VRAM workloads at a favorable cost per GB: inference serving, fine-tuning, rendering, and pipelines that mix AI and visualization. The H200 is built for workloads where memory bandwidth and capacity are the bottleneck, such as large-model training, large-context inference, and memory-bound HPC. NVIDIA AI Enterprise is available for the H200 (contact OpenMetal for options). Talk to an account manager if you are unsure which fits your workload.

How are the GPU servers priced?

Fixed, transparent monthly pricing with no metered hours and no surprise egress charges. Configurations are built to order, so the final quote reflects your exact setup. OpenMetal honors written quotes for 30 days from the date issued. Request a quote to get current pricing

Can I build a multi-node GPU cluster?

Yes. Multiple GPU servers can be interconnected over OpenMetal’s 20 Gbps private network to build multi-node training or inference clusters. Talk to an account manager about cluster sizing and lead times.

Which frameworks are supported?

The servers support common AI and ML frameworks including PyTorch, TensorFlow, JAX, and Hugging Face Transformers, running directly on dedicated hardware with no virtualization layer in the way.

Is the hardware really dedicated?

Yes. Every GPU server is single-tenant bare metal. Your workload has direct access to the GPU, CPU, memory, and storage. No shared hypervisor, no noisy neighbors, and no metered slices of someone else’s cluster.

Can I start with one server and scale later?

Yes. Deploy a single GPU server to start, then add servers and interconnect them over the 20 Gbps private network as your workload grows. Talk to an account manager about scaling paths and lead times.

Is a proof of concept available?

Yes. PoC deployments let your team validate workloads on dedicated hardware before committing.

Where are the GPU servers available?

The RP6000 and H200 are available now from OpenMetal US East in Ashburn, Virginia. Contact us about availability in other regions.

Design your GPU deployment

Tell us about your workload and we’ll build a quote around it, from a single server to a multi-node cluster. Proof-of-concept deployments are available on request.

Schedule a Meeting

Built on the OpenMetal v5 platform. See the full v5 hardware catalog