Bare Metal GPU Server -- NVIDIA RTX PRO 6000 Blackwell SE -- Dual Intel Xeon 6530P, 1TB DDR5, 96GB GDDR7

Resources » Hardware Details » Bare Metal GPU Server — NVIDIA RTX PRO 6000 Blackwell SE — Dual Intel Xeon 6530P, 1TB DDR5, 96GB GDDR7

The OpenMetal RTX PRO 6000 server is a single-tenant bare metal GPU server built on the NVIDIA RTX PRO 6000 Blackwell Server Edition (96GB GDDR7), paired with dual Intel Xeon 6530P (Granite Rapids) processors. It is OpenMetal’s cost-efficient training-and-inference GPU: Blackwell-generation tensor cores with native FP4 support and 96GB of GDDR7 handle model training, fine-tuning, and high-throughput inference at a lower cost per card than HBM-class GPUs. The H200 remains the choice when a workload needs the largest memory footprint or HBM bandwidth; the RTX PRO 6000 is the workhorse for everything from training and fine-tuning to production serving. Like every OpenMetal server, it ships with full root access, no shared tenancy, and fixed monthly pricing — so a GPU running a multi-day training job costs the same as one sitting idle, with no per-GPU-hour meter.

Key Takeaways

96GB GDDR7 per GPU holds sizeable training batches and large inference models on a single card — roughly double the memory of common Ada-generation GPUs like the L40S (48GB).
Blackwell-native FP4 (NVFP4) plus FP8 and BF16 support spans low-precision inference and mixed-precision training, where Blackwell adds a generational step over Hopper and Ada.
Fixed monthly pricing avoids the “idle silicon tax.” On metered GPU-hour clouds you pay a premium that bakes in elasticity you may not need; a sustained training run that pins the GPU at 100% for days is exactly where per-hour metering hurts most. On OpenMetal the marginal cost of running the card harder is zero.
Cost-efficient for both training and inference — a lower per-card rate than H200/H100-class HBM GPUs, well-suited to fine-tuning, small-to-mid-scale training, and high-throughput serving.
1TB DDR5-6400 host memory stages training datasets and keeps vector indexes and embeddings resident for RAG and high-concurrency serving.
Single-tenant bare metal — the full GPU, full PCIe 5.0 bandwidth, no hypervisor overhead, no shared-tenancy contention. Deploy one card, scale to a cluster, or attach to existing infrastructure. See GPU server pricing.

Ready to Deploy an RTX PRO 6000 GPU Server?

Tell us about your inference or fine-tuning workload and we’ll help you configure the right deployment — a single RTX PRO 6000, a dedicated GPU cluster, or RTX PRO 6000 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get an RTX PRO 6000 Quote Schedule a Consultation

Config at a Glance

Component	Specification
GPU	NVIDIA RTX PRO 6000 Blackwell Server Edition, 96GB GDDR7 per GPU, 1–2 GPUs per server
GPU Memory Bandwidth	1.6 TB/s per GPU (512-bit GDDR7)
GPU Max Board Power	600W per GPU
Tensor Support	FP4 (NVFP4, Blackwell-native), FP8, BF16
Processor	2x Intel Xeon 6530P (Granite Rapids, Intel 3)
Total Cores / Threads	64 cores / 128 threads
Base / Max Turbo Frequency	2.3 GHz / 4.1 GHz (3.7 GHz all-core turbo)
L3 Cache	144 MB per CPU
TDP	225W per CPU
System Memory	1TB DDR5-6400 (16 of 32 DIMM slots populated; upgradeable to 2TB)
Boot Storage	2x 960GB NVMe (RAID 1)
Data Storage	1x 6.4TB Micron 7500 MAX NVMe (PCIe Gen4, 3 DWPD)
Max Drive Bays	8x 2.5″ NVMe (1x 6.4TB included, 7 open)
Private Bandwidth	20 Gbps default (2x 10 Gbps LACP-bonded); up to 40 Gbps optional
Public Bandwidth	10 Gbps
PCIe	PCIe 5.0, 88 lanes per processor
Confidential Computing	Intel SGX available (CPU); Intel TDX with GPU passthrough supported via NVIDIA Confidential Computing (one GPU per VM, validated per deployment)
Availability	Available now in US-East (Ashburn, VA); advance booking for other regions
Pricing	Built to order — contact OpenMetal for a quote (fixed monthly, included egress; no per-GPU-hour metering)

Bare Metal GPU Server -- NVIDIA RTX PRO 6000 Blackwell SE -- Dual Intel Xeon 653 architecture diagram

gpu-server-rp6000 component architecture

GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition

The RTX PRO 6000 Blackwell SE is NVIDIA’s Blackwell-generation professional/server GPU with 96GB of GDDR7 memory. For OpenMetal customers, it occupies the inference-and-serving tier: enough memory to hold large models and batches on a single card, Blackwell tensor cores with native FP4 (NVFP4) for the highest-throughput low-precision inference, and a meaningfully lower per-card cost than HBM-class training GPUs. Compared to Ada-generation inference cards (e.g., the L40S at 48GB GDDR6), the RTX PRO 6000 roughly doubles GPU memory and adds the Blackwell FP4 path.

OpenMetal deploys the RTX PRO 6000 as a true bare metal device — passed through directly over PCIe 5.0 with no hypervisor layer. Each server supports 1 or 2 RTX PRO 6000 cards. As with all OpenMetal GPU servers, the GPUs in a two-card server are two discrete accelerators, each with its own memory (not pooled); across nodes, GPUs communicate over the private network rather than a shared GPU-memory fabric.

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Each RTX PRO 6000 server pairs the GPU with two Intel Xeon 6530P processors (Granite Rapids, Intel 3), for 64 cores / 128 threads at 2.3 GHz base / 4.1 GHz turbo, 144 MB L3 per socket, and 88 PCIe 5.0 lanes per processor. The high lane count delivers full-bandwidth PCIe 5.0 to the GPU and the NVMe data drive without contention. The Granite Rapids cores carry Intel AMX and AVX-512, useful for CPU-side tokenization, preprocessing, and embedding pipelines that feed inference. See the Intel Xeon 6530P product page for full CPU detail.

Memory

The RTX PRO 6000 server ships with 1TB of DDR5-6400 across 16 of 32 DIMM slots (one DIMM per channel, both sockets), with 16 open slots to upgrade to 2TB. With 8 channels per socket at 6400 MT/s, host memory bandwidth is high enough to stage inference workloads efficiently. For serving and RAG, this host RAM holds vector indexes, embeddings, request queues, and model variants resident while the GPU runs inference. ECC is standard. 2TB is the standard stocked maximum (32x 64GB, which runs at DDR5-5200).

Storage

OpenMetal separates boot and data storage on every server. The RTX PRO 6000 boots from 2x 960GB NVMe drives in RAID 1, isolating the OS from data so a data-volume change never risks the boot environment — see boot and data drive isolation. The data tier is a 6.4TB Micron 7500 MAX NVMe SSD (PCIe Gen4, 232-layer 3D TLC, 3 DWPD), expandable up to the 8-bay group (7 open bays) for model repositories and datasets.

Metric	Micron 7500 MAX (6.4TB)
Sequential Read	7,000 MB/s
Sequential Write	5,900 MB/s
Random Read	1,100,000 IOPS
Random Write	400,000 IOPS
Read Latency (typical)	70 µs
Write Latency (typical)	15 µs
Endurance	3 DWPD (35,040 TBW)
Warranty	5 years

Fast local NVMe matters for inference serving: loading model weights and swapping model variants is read-bound, and 7 GB/s keeps cold-start and model-switch latency low.

Networking

Every RTX PRO 6000 server has 20 Gbps of private bandwidth by default (2x 10 Gbps uplinks in an LACP bond), upgradeable to 40 Gbps (4x 10 Gbps) as an option available across OpenMetal’s v2+ fleet, plus 10 Gbps of public bandwidth. The private network carries east-west traffic between your servers — multi-node inference fleets, pulling models from OpenMetal storage nodes — and is not metered. OpenMetal’s base network SLA is 99.96%, with measured performance exceeding 99.99% from 2022 through 2026. DDoS protection up to 10 Gbps per IP is included. See LACP network bonding.

Egress pricing: 95th-percentile billing, not per-GB transfer

OpenMetal bills public network usage on a 95th-percentile model with a generous included allotment, not per-GB. For inference serving — where responses stream continuously to end users — this avoids the per-GB egress bill that AWS, GCP, and Azure apply, which on high-traffic inference endpoints can rival the GPU compute cost itself.

Security and Confidential Computing

The RTX PRO 6000 runs as a single-tenant bare metal server — physical isolation, not a shared hypervisor — the foundational property for protecting proprietary models and inference data. The Xeon 6530P supports Intel SGX for application-level enclaves and TME-MK total memory encryption. Hardware security features include AES-NI, Intel Boot Guard, and Control-Flow Enforcement Technology (CET).

Confidential GPU computing: Intel TDX and GPU passthrough can be combined on the RTX PRO 6000 using NVIDIA Confidential Computing, which runs the passed-through GPU inside a TDX confidential VM (Trust Domain) with attested, encrypted CPU-to-GPU transfers. The RTX PRO 6000 Blackwell Server Edition is explicitly supported with Intel TDX in NVIDIA’s confidential computing deployment guidance. Today’s validated mode is one GPU per confidential VM; OpenMetal scopes and configures this per workload, so it is an engineered deployment rather than a self-serve toggle. (Every RTX PRO 6000 also runs as single-tenant bare metal with physical isolation by default.)

HIPAA and regulatory compliance

OpenMetal is HIPAA compliant at the organizational level and offers Business Associate Agreements (BAAs). The RTX PRO 6000 is deployed in Ashburn, Virginia (NTT DATA VA1), whose facility-operator certifications include SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, and HIPAA. Facility certifications are held by the facility operator (NTT), not OpenMetal; OpenMetal’s HIPAA posture is organizational. Regulated inference workloads — clinical inference, PHI-adjacent serving — can run on RTX PRO 6000 servers in the HIPAA-compliant Ashburn facility under an OpenMetal BAA.

Recommended Workloads

Model Training and Fine-Tuning

The RTX PRO 6000’s 96GB GDDR7 and Blackwell mixed-precision (BF16/FP8) tensor cores handle training from scratch on small-to-mid models, full fine-tuning of larger models, and LoRA/QLoRA on a single card — pair two RTX PRO 6000 cards, or scale to a cluster, for bigger jobs. Frameworks: PyTorch (FSDP), Hugging Face Transformers/PEFT, DeepSpeed, NVIDIA NeMo. Training is where OpenMetal’s fixed-cost model pays off most: a job that pins the GPU at full utilization for days carries no per-hour meter and no egress bill on the data you pull back, unlike metered GPU-hour clouds where sustained training is the most expensive thing you can run.

High-Throughput LLM Inference and Serving

Blackwell FP4/FP8 tensor cores and 96GB GDDR7 also make the RTX PRO 6000 a strong production inference card: serve large models with high concurrency and large batch sizes on a single card via NVIDIA NIM, vLLM, or TensorRT-LLM. For a published inference throughput study on this GPU class, see OpenMetal’s RTX PRO 6000 vs H100 for AI inference.

Retrieval-Augmented Generation (RAG)

Run embedding and generation models on the GPU while 1TB of host RAM holds large vector indexes resident, with the NVMe data tier providing fast index and model persistence — a strong single-box RAG platform.

Computer Vision, Media, and Generative Imaging

Blackwell’s media engines and GDDR7 bandwidth suit batch image/video inference, generative imaging, and CV pipelines, where the RTX PRO 6000’s memory and throughput fit high-resolution batches.

Multi-GPU and Mixed-GPU Clusters

Scale into a dedicated OpenMetal GPU cluster — all-RTX PRO 6000 for inference fleets, or mixed with H200 nodes where some workloads need HBM-class training memory and others need cost-efficient inference. Connected over the private mesh. (Each GPU is a discrete card with its own memory; multi-node clusters communicate over the private network using data and pipeline parallelism.)

“With v5 we modernized the foundation of our bare metal and private cloud catalog. Adding the RTX PRO 6000 and H200 was the natural next step. Customers running AI and HPC workloads get fully dedicated GPUs on the same modern Xeon 6000 platform, with transparent monthly billing and infrastructure they actually control, not throttled, metered slices of someone else’s cluster.”

Jamie Tischart, CTO, OpenMetal

Ready to Deploy an RTX PRO 6000 GPU Server?

Get an RTX PRO 6000 Quote Schedule a Consultation

How the RTX PRO 6000 Compares to Public Cloud GPU Instances

Hyperscaler GPU instances (AWS G6/P-series, GCP, Azure) deliver GPUs on a per-GPU-hour metered model with shared-tenancy infrastructure and per-GB egress. OpenMetal’s RTX PRO 6000 is structurally different: dedicated single-tenant hardware, fixed monthly pricing, and included egress. For sustained training runs and always-on inference endpoints — workloads that keep the GPU busy — the fixed-cost model is typically far cheaper than metered GPU-hours plus egress.

This is the “idle silicon tax”: metered GPU pricing bundles the provider’s own idle-capacity risk and margin into every hour, so you pay for elasticity whether or not your workload is bursty. A steady, high-utilization workload subsidizes other tenants’ burst headroom. A dedicated, fixed-cost GPU removes that premium — once it’s yours, running it at 100% costs no more than leaving it idle, which inverts the incentive in favor of squeezing maximum utilization out of every card.

When public cloud GPU is the better fit: genuinely spiky, scale-to-zero inference; short one-off experiments; or deep integration with managed ML services. A detailed RTX PRO 6000-vs-cloud cost comparison is planned as a companion page.

Deployment Options

The RTX PRO 6000 can be deployed three ways:

Dedicated GPU server — a single RTX PRO 6000 (or dual-GPU) bare metal server with full root access, IPMI, and fixed monthly pricing. Best for inference serving and fine-tuning.
Dedicated GPU cluster — multiple GPU nodes (all-RTX PRO 6000 or mixed with H200) on a private mesh for scaled inference and distributed jobs. Built to order.
Attached to existing infrastructure — add RTX PRO 6000 nodes to an existing OpenMetal Hosted Private Cloud or bare metal deployment, putting inference acceleration on the same private network as your existing compute and storage.

Where to deploy

The RTX PRO 6000 is available now in Ashburn, Virginia (US-East), hosted in a Tier III, HIPAA-compliant NTT facility, with advance reservations available for Los Angeles, Amsterdam, and Singapore. Proof of Concept clusters are available for testing; ramp pricing is available for migrations from other providers.

Location	Region	Certifications (facility operator)	Location Page
Ashburn, VA	US-East	SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, HIPAA	Ashburn facility specs

See GPU server pricing.

Get an RTX PRO 6000 Quote

Ready to deploy? Tell us about your AI/ML inference needs and we’ll provide a custom quote for the NVIDIA RTX PRO 6000 — as a single GPU server, a dedicated GPU cluster, or GPU nodes attached to an existing OpenMetal deployment.

Single GPU server: One or two RTX PRO 6000 cards with full root access and IPMI
GPU cluster: Multi-node deployments (all-RTX PRO 6000 or mixed with H200) on a private mesh
Attached GPU: Add RTX PRO 6000 capacity to your existing Hosted Private Cloud or bare metal footprint
Custom configurations: RAM upgrades to 2TB, additional NVMe drives, dual-GPU

All deployments include fixed monthly pricing, included egress, a 99.96%+ network SLA, and DDoS protection. Ramp pricing is available for migrations.

Related Hardware

Product specifications, pricing, and availability may change due to market conditions and other factors. For the most current information, please contact the OpenMetal team directly.

Key Takeaways

Ready to Deploy an RTX PRO 6000 GPU Server?

Config at a Glance

GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Memory

Storage

Networking

Egress pricing: 95th-percentile billing, not per-GB transfer

Security and Confidential Computing

HIPAA and regulatory compliance

Recommended Workloads

Model Training and Fine-Tuning

High-Throughput LLM Inference and Serving

Retrieval-Augmented Generation (RAG)

Computer Vision, Media, and Generative Imaging

Multi-GPU and Mixed-GPU Clusters

Ready to Deploy an RTX PRO 6000 GPU Server?

How the RTX PRO 6000 Compares to Public Cloud GPU Instances

Deployment Options

Where to deploy

Get an RTX PRO 6000 Quote

Related OpenMetal Answers

Related Hardware