The OpenMetal RP6000 is a single-tenant bare metal GPU server built on the NVIDIA RTX Pro 6000 Blackwell Server Edition (96GB GDDR7), paired with dual Intel Xeon 6530P (Granite Rapids) processors. It is OpenMetal’s cost-efficient training-and-inference GPU: Blackwell-generation tensor cores with native FP4 support and 96GB of GDDR7 handle model training, fine-tuning, and high-throughput inference at a lower cost per card than HBM-class GPUs. The H200 remains the choice when a workload needs the largest memory footprint or HBM bandwidth; the RP6000 is the workhorse for everything from training and fine-tuning to production serving. Like every OpenMetal server, it ships with full root access, no shared tenancy, and fixed monthly pricing — so a GPU running a multi-day training job costs the same as one sitting idle, with no per-GPU-hour meter.

Key Takeaways

  • 96GB GDDR7 per GPU holds sizeable training batches and large inference models on a single card — roughly double the memory of common Ada-generation GPUs like the L40S (48GB).
  • Blackwell-native FP4 (NVFP4) plus FP8 and BF16 support spans low-precision inference and mixed-precision training, where Blackwell adds a generational step over Hopper and Ada.
  • Fixed monthly pricing avoids the “idle silicon tax.” On metered GPU-hour clouds you pay a premium that bakes in elasticity you may not need; a sustained training run that pins the GPU at 100% for days is exactly where per-hour metering hurts most. On OpenMetal the marginal cost of running the card harder is zero.
  • Cost-efficient for both training and inference — a lower per-card rate than H200/H100-class HBM GPUs, well-suited to fine-tuning, small-to-mid-scale training, and high-throughput serving.
  • 1TB DDR5-6400 host memory stages training datasets and keeps vector indexes and embeddings resident for RAG and high-concurrency serving.
  • Single-tenant bare metal — the full GPU, full PCIe 5.0 bandwidth, no hypervisor overhead, no shared-tenancy contention. Deploy one card, scale to a cluster, or attach to existing infrastructure. See GPU server pricing.

Ready to Deploy an RP6000 GPU Server?

Tell us about your inference or fine-tuning workload and we’ll help you configure the right deployment — a single RP6000, a dedicated GPU cluster, or RP6000 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get an RP6000 Quote   Schedule a Consultation

Config at a Glance

ComponentSpecification
GPUNVIDIA RTX Pro 6000 Blackwell Server Edition, 96GB GDDR7 per GPU, 1–2 GPUs per server
GPU Memory Bandwidth1.79 TB/s per GPU (512-bit GDDR7)
GPU Max Board Power600W per GPU
Tensor SupportFP4 (NVFP4, Blackwell-native), FP8, BF16
Processor2x Intel Xeon 6530P (Granite Rapids, Intel 3)
Total Cores / Threads64 cores / 128 threads
Base / Max Turbo Frequency2.3 GHz / 4.1 GHz (3.7 GHz all-core turbo)
L3 Cache144 MB per CPU
TDP225W per CPU
System Memory1TB DDR5-6400 (16 of 32 DIMM slots populated; upgradeable to 2TB)
Boot Storage2x 960GB NVMe (RAID 1)
Data Storage1x 6.4TB Micron 7500 MAX NVMe (PCIe Gen4, 3 DWPD)
Max Drive Bays24x 2.5″ NVMe/SATA/SAS
Private Bandwidth40 Gbps (4x 10 Gbps LACP-bonded)
Public Bandwidth10 Gbps
PCIePCIe 5.0, 88 lanes per processor
Confidential ComputingIntel SGX available (CPU); Intel TDX not combinable with GPU passthrough 
AvailabilityAvailable now in US-East (Ashburn, VA); advance booking for other regions
PricingBuilt to order — contact OpenMetal for a quote (fixed monthly, included egress; no per-GPU-hour metering)

Bare Metal GPU Server -- NVIDIA RTX Pro 6000 Blackwell SE -- Dual Intel Xeon 653 architecture diagram

gpu-server-rp6000 component architecture

GPU: NVIDIA RTX Pro 6000 Blackwell Server Edition

The RTX Pro 6000 Blackwell SE is NVIDIA’s Blackwell-generation professional/server GPU with 96GB of GDDR7 memory. For OpenMetal customers, it occupies the inference-and-serving tier: enough memory to hold large models and batches on a single card, Blackwell tensor cores with native FP4 (NVFP4) for the highest-throughput low-precision inference, and a meaningfully lower per-card cost than HBM-class training GPUs. Compared to Ada-generation inference cards (e.g., the L40S at 48GB GDDR6), the RP6000 roughly doubles GPU memory and adds the Blackwell FP4 path.

OpenMetal deploys the RP6000 as a true bare metal device — passed through directly over PCIe 5.0 with no hypervisor layer. Each server supports 1 or 2 RP6000 cards. As with all OpenMetal GPU servers, GPU-memory pooling is available between GPUs within the same server; across nodes, GPUs communicate over the private network rather than a shared GPU-memory fabric. 

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Each RP6000 server pairs the GPU with two Intel Xeon 6530P processors (Granite Rapids, Intel 3), for 64 cores / 128 threads at 2.3 GHz base / 4.1 GHz turbo, 144 MB L3 per socket, and 88 PCIe 5.0 lanes per processor. The high lane count delivers full-bandwidth PCIe 5.0 to the GPU and the NVMe data drive without contention. The Granite Rapids cores carry Intel AMX and AVX-512, useful for CPU-side tokenization, preprocessing, and embedding pipelines that feed inference. See the Intel Xeon 6530P product page for full CPU detail.

Memory

The RP6000 server ships with 1TB of DDR5-6400 across 16 of 32 DIMM slots (one DIMM per channel, both sockets), with 16 open slots to upgrade to 2TB. With 8 channels per socket at 6400 MT/s, host memory bandwidth is high enough to stage inference workloads efficiently. For serving and RAG, this host RAM holds vector indexes, embeddings, request queues, and model variants resident while the GPU runs inference. ECC is standard. The practical maximum at full DDR5-6400 speed is 2TB; a 4TB ceiling exists with 128GB RDIMMs but is rarely chosen.

Storage

OpenMetal separates boot and data storage on every server. The RP6000 boots from 2x 960GB NVMe drives in RAID 1, isolating the OS from data so a data-volume change never risks the boot environment — see boot and data drive isolation. The data tier is a 6.4TB Micron 7500 MAX NVMe SSD (PCIe Gen4, 232-layer 3D TLC, 3 DWPD), expandable up to the 24-bay chassis limit for model repositories and datasets.

MetricMicron 7500 MAX (6.4TB)
Sequential Read7,000 MB/s
Sequential Write5,900 MB/s
Random Read1,100,000 IOPS
Random Write400,000 IOPS
Read Latency (typical)70 µs
Write Latency (typical)15 µs
Endurance3 DWPD (35,040 TBW)
Warranty5 years

Fast local NVMe matters for inference serving: loading model weights and swapping model variants is read-bound, and 7 GB/s keeps cold-start and model-switch latency low.

Networking

Every RP6000 server has 40 Gbps of private bandwidth as 4x 10 Gbps uplinks in an LACP bond, plus 10 Gbps of public bandwidth. The private network carries east-west traffic between your servers — multi-node inference fleets, pulling models from OpenMetal storage nodes — and is not metered. OpenMetal’s base network SLA is 99.96%, with measured performance exceeding 99.99% from 2022 through 2026. DDoS protection up to 10 Gbps per IP is included. See LACP network bonding.

Egress pricing: 95th-percentile billing, not per-GB transfer

OpenMetal bills public network usage on a 95th-percentile model with a generous included allotment, not per-GB. For inference serving — where responses stream continuously to end users — this avoids the per-GB egress bill that AWS, GCP, and Azure apply, which on high-traffic inference endpoints can rival the GPU compute cost itself.

Security and Confidential Computing

The RP6000 runs as a single-tenant bare metal server — physical isolation, not a shared hypervisor — the foundational property for protecting proprietary models and inference data. The Xeon 6530P supports Intel SGX for application-level enclaves and TME-MK total memory encryption. Hardware security features include AES-NI, Intel Boot Guard, and Control-Flow Enforcement Technology (CET).

Important: Intel TDX (Trust Domain Extensions) and GPU passthrough cannot be combined in a single trust boundary on this platform. Customers needing confidential VMs (TDX) should use OpenMetal’s non-GPU bare metal servers; GPU workloads run on the RP6000 as single-tenant bare metal with physical isolation. 

HIPAA and regulatory compliance

OpenMetal is HIPAA compliant at the organizational level and offers Business Associate Agreements (BAAs). The RP6000 is deployed in Ashburn, Virginia (NTT DATA VA1), whose facility-operator certifications include SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, and HIPAA. Facility certifications are held by the facility operator (NTT), not OpenMetal; OpenMetal’s HIPAA posture is organizational. Regulated inference workloads — clinical inference, PHI-adjacent serving — can run on RP6000 servers in the HIPAA-compliant Ashburn facility under an OpenMetal BAA.

Recommended Workloads

Model Training and Fine-Tuning

The RP6000’s 96GB GDDR7 and Blackwell mixed-precision (BF16/FP8) tensor cores handle training from scratch on small-to-mid models, full fine-tuning of larger models, and LoRA/QLoRA on a single card — pair two RP6000s, or scale to a cluster, for bigger jobs. Frameworks: PyTorch (FSDP), Hugging Face Transformers/PEFT, DeepSpeed, NVIDIA NeMo. Training is where OpenMetal’s fixed-cost model pays off most: a job that pins the GPU at full utilization for days carries no per-hour meter and no egress bill on the data you pull back, unlike metered GPU-hour clouds where sustained training is the most expensive thing you can run.

High-Throughput LLM Inference and Serving

Blackwell FP4/FP8 tensor cores and 96GB GDDR7 also make the RP6000 a strong production inference card: serve large models with high concurrency and large batch sizes on a single card via NVIDIA NIM, vLLM, or TensorRT-LLM. For a published inference throughput study on this GPU class, see OpenMetal’s RTX Pro 6000 vs H100 for AI inference.

Retrieval-Augmented Generation (RAG)

Run embedding and generation models on the GPU while 1TB of host RAM holds large vector indexes resident, with the NVMe data tier providing fast index and model persistence — a strong single-box RAG platform.

Computer Vision, Media, and Generative Imaging

Blackwell’s media engines and GDDR7 bandwidth suit batch image/video inference, generative imaging, and CV pipelines, where the RP6000’s memory and throughput fit high-resolution batches.

Multi-GPU and Mixed-GPU Clusters

Scale into a dedicated OpenMetal GPU cluster — all-RP6000 for inference fleets, or mixed with H200 nodes where some workloads need HBM-class training memory and others need cost-efficient inference. Connected over the 40 Gbps private mesh. (GPU-memory pooling is within a node; multi-node clusters communicate over the private network using data and pipeline parallelism.)

“With v5 we modernized the foundation of our bare metal and private cloud catalog. Adding the RP6000 and H200 was the natural next step. Customers running AI and HPC workloads get fully dedicated GPUs on the same modern Xeon 6000 platform, with transparent monthly billing and infrastructure they actually control, not throttled, metered slices of someone else’s cluster.”

Jamie Tischart, CTO, OpenMetal

Ready to Deploy an RP6000 GPU Server?

Tell us about your inference or fine-tuning workload and we’ll help you configure the right deployment — a single RP6000, a dedicated GPU cluster, or RP6000 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get an RP6000 Quote   Schedule a Consultation

How the RP6000 Compares to Public Cloud GPU Instances

Hyperscaler GPU instances (AWS G6/P-series, GCP, Azure) deliver GPUs on a per-GPU-hour metered model with shared-tenancy infrastructure and per-GB egress. OpenMetal’s RP6000 is structurally different: dedicated single-tenant hardware, fixed monthly pricing, and included egress. For sustained training runs and always-on inference endpoints — workloads that keep the GPU busy — the fixed-cost model is typically far cheaper than metered GPU-hours plus egress.

This is the “idle silicon tax”: metered GPU pricing bundles the provider’s own idle-capacity risk and margin into every hour, so you pay for elasticity whether or not your workload is bursty. A steady, high-utilization workload subsidizes other tenants’ burst headroom. A dedicated, fixed-cost GPU removes that premium — once it’s yours, running it at 100% costs no more than leaving it idle, which inverts the incentive in favor of squeezing maximum utilization out of every card.

When public cloud GPU is the better fit: genuinely spiky, scale-to-zero inference; short one-off experiments; or deep integration with managed ML services. A detailed RP6000-vs-cloud cost comparison is planned as a companion page.

Deployment Options

The RP6000 can be deployed three ways:

  • Dedicated GPU server — a single RP6000 (or dual-GPU) bare metal server with full root access, IPMI, and fixed monthly pricing. Best for inference serving and fine-tuning.
  • Dedicated GPU cluster — multiple GPU nodes (all-RP6000 or mixed with H200) on a private 40 Gbps mesh for scaled inference and distributed jobs. Built to order.
  • Attached to existing infrastructure — add RP6000 nodes to an existing OpenMetal Hosted Private Cloud or bare metal deployment, putting inference acceleration on the same private network as your existing compute and storage.

Where to deploy

The RP6000 is available now in Ashburn, Virginia (US-East), hosted in a Tier III, HIPAA-compliant NTT facility, with advance reservations available for Los Angeles, Amsterdam, and Singapore. Proof of Concept clusters are available for testing; ramp pricing is available for migrations from other providers.

LocationRegionCertifications (facility operator)Location Page
Ashburn, VAUS-EastSOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, HIPAAAshburn facility specs

See GPU server pricing.

Get an RP6000 Quote

Ready to deploy? Tell us about your AI/ML inference needs and we’ll provide a custom quote for the NVIDIA RTX Pro 6000 — as a single GPU server, a dedicated GPU cluster, or GPU nodes attached to an existing OpenMetal deployment.

  • Single GPU server: One or two RP6000 cards with full root access and IPMI
  • GPU cluster: Multi-node deployments (all-RP6000 or mixed with H200) on a private 40 Gbps mesh
  • Attached GPU: Add RP6000 capacity to your existing Hosted Private Cloud or bare metal footprint
  • Custom configurations: RAM upgrades to 2TB, additional NVMe drives, dual-GPU

All deployments include fixed monthly pricing, included egress, a 99.96%+ network SLA, and DDoS protection. Ramp pricing is available for migrations.



Product specifications, pricing, and availability may change due to market conditions and other factors. For the most current information, please contact the OpenMetal team directly.