Bare Metal GPU Server -- NVIDIA H200 NVL -- Dual Intel Xeon 6530P, 1TB DDR5, 141GB HBM3e

Resources » Hardware Details » Bare Metal GPU Server — NVIDIA H200 NVL — Dual Intel Xeon 6530P, 1TB DDR5, 141GB HBM3e

The OpenMetal H200 is a single-tenant bare metal GPU server built on the NVIDIA H200 NVL (141GB HBM3e) paired with dual Intel Xeon 6530P (Granite Rapids) processors. It replaces the H100 in OpenMetal’s lineup, carrying 50% more GPU memory per card (141GB vs 94GB) and higher memory bandwidth, so larger models and longer context windows fit in a single GPU’s memory. Like every OpenMetal server, it ships with full root access, no shared tenancy, and fixed monthly pricing — you get the whole GPU, not a time-sliced fraction of one.

Key Takeaways

141GB HBM3e per GPU fits 70B-class models and long-context inference in a single card’s memory, avoiding the cross-GPU sharding that H100’s 80–94GB forces.
1TB DDR5-6400 system memory keeps datasets, embeddings, and data-loader pipelines resident in host RAM so the GPU stays fed during training and batch inference.
Standard CUDA runs unlicensed; NVIDIA AI Enterprise (NVAIE) available — PyTorch, vLLM, and TensorRT-LLM need no NVIDIA license; NVAIE’s supported enterprise stack (NIM, NeMo) is available for H200 deployments. Contact OpenMetal for current NVAIE options.
Single-tenant bare metal means the full GPU, full PCIe bandwidth, and direct hardware access — no hypervisor overhead, no noisy-neighbor contention on the accelerator.
Fixed monthly pricing with included egress — no per-GPU-hour metering surprises and no per-GB data-transfer bill when you pull model weights or training data back out.
Deploy one card or scale to a multi-GPU cluster — start with a single H200 server and grow into a dedicated GPU cluster, or attach H200 nodes to an existing OpenMetal Hosted Private Cloud or bare metal footprint. See GPU server pricing.

Ready to Deploy an H200 GPU Server?

Tell us about your workload and we’ll help you configure the right deployment — a single H200, a dedicated multi-GPU cluster, or H200 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get a H200 Quote Schedule a Consultation

Config at a Glance

Component	Specification
GPU	NVIDIA H200 NVL (PCIe), 141GB HBM3e per GPU, 1–2 GPUs per server
GPU Memory Bandwidth	4.8 TB/s per GPU (HBM3e)
GPU Max Board Power	600W per GPU
Processor	2x Intel Xeon 6530P (Granite Rapids, Intel 3)
Total Cores / Threads	64 cores / 128 threads
Base / Max Turbo Frequency	2.3 GHz / 4.1 GHz (3.7 GHz all-core turbo)
L3 Cache	144 MB per CPU
TDP	225W per CPU
System Memory	1TB DDR5-6400 (16 of 32 DIMM slots populated; upgradeable to 2TB)
Boot Storage	2x 960GB NVMe (RAID 1)
Data Storage	1x 6.4TB Micron 7500 MAX NVMe (PCIe Gen4, 3 DWPD)
Max Drive Bays	8x 2.5″ NVMe (1x 6.4TB included, 7 open)
Private Bandwidth	20 Gbps default (2x 10 Gbps LACP-bonded); up to 40 Gbps optional
Public Bandwidth	10 Gbps
PCIe	PCIe 5.0, 88 lanes per processor
Confidential Computing	Intel SGX available (CPU); Intel TDX with confidential GPU passthrough (NVIDIA Confidential Computing, one GPU per VM) supported in principle, but outside NVIDIA’s documented validated pairing and requires OpenMetal validation
Software	Standard CUDA runs unlicensed; NVIDIA AI Enterprise (NVAIE) available — contact OpenMetal for options
Availability	Available now in US-East (Ashburn, VA); advance booking for other regions
Pricing	Built to order — contact OpenMetal for a quote (fixed monthly, included egress; no per-GPU-hour metering)

Bare Metal GPU Server -- NVIDIA H200 NVL -- Dual Intel Xeon 6530P, 1TB DDR5, 141 architecture diagram

gpu-server-h200 component architecture

GPU: NVIDIA H200 NVL

The H200 NVL is NVIDIA’s Hopper-generation data center GPU in PCIe form factor, carrying 141GB of HBM3e memory per card — roughly 50% more than the H100 NVL’s 94GB and nearly double the H100 SXM’s 80GB. For OpenMetal customers, the practical effect is model-fit: a 70B-parameter model in 16-bit precision needs ~140GB of weights, which now lands on a single H200 rather than requiring two H100s with the latency and complexity of tensor-parallel sharding. Memory bandwidth is 4.8 TB/s, which directly raises tokens-per-second on memory-bound inference.

OpenMetal deploys the H200 as a true bare metal device: the GPU is passed through directly with no hypervisor layer between your code and the accelerator, so you get full PCIe 5.0 bandwidth and direct access to NVIDIA’s driver stack. Each server supports 1 or 2 H200 cards. In a two-card server the GPUs are two discrete accelerators, each with its own memory (not pooled); across nodes, GPUs communicate over the private network rather than a shared GPU-memory fabric, so size per-server GPU memory for workloads that need tight GPU-to-GPU coupling.

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Each H200 server pairs the GPU with two Intel Xeon 6530P processors (Granite Rapids, Intel 3 process), for 64 cores / 128 threads total at 2.3 GHz base / 4.1 GHz turbo, with 144 MB of L3 cache per socket and 88 PCIe 5.0 lanes per processor. The high lane count matters for GPU servers: it provides full-bandwidth PCIe 5.0 to the GPU plus the NVMe data drive without contention. The Granite Rapids cores also carry Intel AMX and AVX-512, useful for CPU-side data preprocessing, tokenization, and embedding pipelines that feed the GPU. OpenMetal selected the 6530P to keep the host from bottlenecking the accelerator during data-loading-heavy training runs. See the Intel Xeon 6530P product page for full CPU detail.

Memory

The H200 server ships with 1TB of DDR5-6400 across 16 of 32 DIMM slots (one DIMM per channel, both sockets), with 16 open slots to upgrade to 2TB. With 8 memory channels per socket at 6400 MT/s, the platform delivers high host-memory bandwidth to keep training data and inference batches resident in RAM. For GPU workloads this host memory acts as the staging tier: dataset shards, tokenized corpora, vector indexes, and model checkpoints live in system RAM and stream to the GPU over PCIe 5.0. ECC is standard for the data integrity that long training runs require. 2TB is the standard stocked maximum (all 32 slots with 64GB RDIMMs, which runs at DDR5-5200).

Storage

OpenMetal separates boot and data storage on every server. The H200 boots from 2x 960GB NVMe drives in RAID 1, isolating the operating system from your data drives so a data-volume change never risks the boot environment — see boot and data drive isolation. The data tier is a 6.4TB Micron 7500 MAX NVMe SSD (PCIe Gen4, 232-layer 3D TLC, 3 DWPD mixed-use endurance), with capacity to add drives up to the 8-bay group (7 open bays) for larger datasets and checkpoint storage.

Metric	Micron 7500 MAX (6.4TB)
Sequential Read	7,000 MB/s
Sequential Write	5,900 MB/s
Random Read	1,100,000 IOPS
Random Write	400,000 IOPS
Read Latency (typical)	70 µs
Write Latency (typical)	15 µs
Endurance	3 DWPD (35,040 TBW)
Warranty	5 years

High sequential read throughput matters for GPU training: streaming sharded datasets and loading multi-hundred-GB checkpoints is read-bound, and 7 GB/s per drive keeps the data loader ahead of the GPU.

Networking

Every H200 server has 20 Gbps of private bandwidth by default (2x 10 Gbps uplinks in an LACP bond), upgradeable to 40 Gbps (4x 10 Gbps) as an option available across OpenMetal’s v2+ fleet, plus 10 Gbps of public bandwidth. The private network is the path for multi-node GPU work — distributed training, parameter servers, and pulling datasets from OpenMetal storage nodes — and east-west traffic between your servers is not metered. OpenMetal’s base network SLA is 99.96%, with measured performance exceeding 99.99% from 2022 through 2026. DDoS protection of up to 10 Gbps per IP is included. See LACP network bonding.

Egress pricing: 95th-percentile billing, not per-GB transfer

OpenMetal bills public network usage on a 95th-percentile model with a generous included allotment, not per-GB like the hyperscalers. For GPU workloads this is a material difference: pulling trained model weights, exporting datasets, or serving inference responses to end users does not generate the per-GB egress bill that AWS, GCP, and Azure charge — where data-transfer-out on GPU instances frequently rivals the compute cost itself.

Security and Confidential Computing

The H200 runs as a single-tenant bare metal server — physical isolation, not a shared hypervisor — which is the foundational security property for sensitive training data and proprietary model weights. The Xeon 6530P supports Intel SGX for application-level enclaves and TME-MK total memory encryption. Hardware security features include AES-NI, Intel Boot Guard, and Control-Flow Enforcement Technology (CET).

Confidential GPU computing: Intel TDX and confidential GPU passthrough can in principle be combined using NVIDIA Confidential Computing, which runs the passed-through GPU inside a TDX confidential VM (Trust Domain) with attested, encrypted CPU-to-GPU transfers. The H200 (NVIDIA Hopper) supports NVIDIA Confidential Computing, but H200 NVL CC sits outside NVIDIA’s documented validated CPU pairing (validated with an earlier Intel generation, not our Granite Rapids), so OpenMetal must validate the exact build before offering it; one GPU per confidential VM in today’s validated mode. This is an engineered deployment rather than a self-serve toggle. (Every H200 also runs as single-tenant bare metal with physical isolation by default.)

HIPAA and regulatory compliance

OpenMetal is HIPAA compliant at the organizational level and offers Business Associate Agreements (BAAs). The H200 is currently deployed in Ashburn, Virginia (NTT DATA VA1), whose facility-operator certifications include SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, and HIPAA. Facility certifications are held by the facility operator (NTT), not by OpenMetal; OpenMetal’s HIPAA posture is organizational. Healthcare and regulated AI workloads — training on PHI, clinical inference — can run on H200 servers hosted in the HIPAA-compliant Ashburn facility under an OpenMetal BAA.

Recommended Workloads

Large Language Model Fine-Tuning and Training

The 141GB HBM3e lets a single H200 hold 70B-parameter models in 16-bit precision for full fine-tuning, or much larger models with LoRA/QLoRA, without tensor-parallel sharding across cards. A two-card H200 server provides two discrete 141GB GPUs (282GB aggregate, not pooled); larger jobs are sharded across both cards in software. Frameworks: PyTorch FSDP, DeepSpeed, Hugging Face Transformers, NVIDIA NeMo (available with NVAIE).

Large-Context and High-Throughput Inference

Memory-bound inference benefits directly from the H200’s 4.8 TB/s bandwidth and 141GB capacity: longer context windows, larger KV caches, and bigger batch sizes per card. Serve with NVIDIA NIM microservices (available with NVAIE), vLLM, or TensorRT-LLM. A single H200 serves models that would require two H100s.

Retrieval-Augmented Generation (RAG) and Vector Workloads

Run the embedding model, vector index, and generation model together: 1TB of host RAM holds large vector indexes resident while the H200 handles embedding and generation, with the NVMe data tier providing fast index persistence.

HPC and Scientific Computing

Hopper’s FP64 and tensor cores suit computational chemistry, CFD, genomics, and physics simulation. The dual Xeon 6530P host with AMX/AVX-512 handles the CPU-bound portions, and 20 Gbps private networking by default (up to 40 Gbps optional) links multiple H200 nodes for MPI workloads.

Multi-GPU Cluster Workloads

Scale beyond a single server into a dedicated OpenMetal GPU cluster — same-GPU (all H200) or mixed (H200 + NVIDIA RTX PRO 6000) — connected over the private mesh for distributed training and large-scale inference serving. (Multi-node clusters communicate over the private network; each GPU is a discrete card with its own memory, so distributed work uses data and pipeline parallelism across nodes.)

“With v5 we modernized the foundation of our bare metal and private cloud catalog. Adding the NVIDIA RTX PRO 6000 and H200 was the natural next step. Customers running AI and HPC workloads get fully dedicated GPUs on the same modern Xeon 6000 platform, with transparent monthly billing and infrastructure they actually control, not throttled, metered slices of someone else’s cluster.”

Jamie Tischart, CTO, OpenMetal

Ready to Deploy an H200 GPU Server?

Get a H200 Quote Schedule a Consultation

How the H200 Compares to Public Cloud GPU Instances

Hyperscaler GPU instances (AWS P5, GCP A3, Azure ND-series) deliver H100/H200-class accelerators on a per-GPU-hour metered model with shared-tenancy infrastructure and per-GB egress. OpenMetal’s H200 is structurally different: dedicated single-tenant hardware, fixed monthly pricing, and included egress. For sustained 24/7 training or always-on inference, the fixed-cost model is typically far cheaper than metered GPU-hours. Metered pricing also charges an “idle silicon tax” — every GPU-hour bundles the provider’s elasticity and idle-capacity risk into the rate, so a steady high-utilization training job effectively subsidizes other tenants’ burst headroom. On a dedicated H200, running the card at 100% for days costs no more than leaving it idle.

When public cloud GPU is the better fit: spiky, scale-to-zero inference; short experimental runs measured in hours; or deep integration with a hyperscaler’s managed ML services (SageMaker, Vertex AI). A detailed H200-vs-cloud cost comparison is planned as a companion page.

Deployment Options

The H200 can be deployed three ways:

Dedicated GPU server — a single H200 (or dual-GPU) bare metal server with full root access, IPMI, and fixed monthly pricing. Best for single-node training, fine-tuning, and inference serving.
Dedicated GPU cluster — multiple GPU nodes (all-H200 or mixed with NVIDIA RTX PRO 6000) on a private mesh for distributed training and scaled inference. Built to order.
Attached to existing infrastructure — add H200 nodes to an existing OpenMetal Hosted Private Cloud or bare metal deployment, putting GPU acceleration on the same private network as your existing compute and storage.

Where to deploy

The H200 is available now in Ashburn, Virginia (US-East), hosted in a Tier III, HIPAA-compliant NTT facility. Advance reservations are available for OpenMetal’s other regions — Los Angeles, Amsterdam, and Singapore — for customers planning capacity ahead of deployment.

Location	Region	Certifications (facility operator)	Location Page
Ashburn, VA	US-East	SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, HIPAA	Ashburn facility specs

Proof of Concept clusters are available for testing. Ramp pricing is available for migrations from other providers, with fixed monthly pricing once deployed. See GPU server pricing.

Get an H200 Quote

Ready to deploy? Tell us about your AI/ML infrastructure needs and we’ll provide a custom quote for the NVIDIA H200 — as a single GPU server, a dedicated GPU cluster, or GPU nodes attached to an existing OpenMetal deployment.

Single GPU server: One or two H200 cards with full root access and IPMI
GPU cluster: Multi-node deployments (all-H200 or mixed-GPU) on a private mesh
Attached GPU: Add H200 capacity to your existing Hosted Private Cloud or bare metal footprint
Custom configurations: RAM upgrades to 2TB, additional NVMe drives, multi-GPU (two discrete cards)

All deployments include fixed monthly pricing, included egress, a 99.96%+ network SLA, and DDoS protection; NVIDIA AI Enterprise (NVAIE) is available for H200 deployments (contact OpenMetal for current options). Ramp pricing is available for migrations.

Related Hardware

Product specifications, pricing, and availability may change due to market conditions and other factors. For the most current information, please contact the OpenMetal team directly.

Key Takeaways

Ready to Deploy an H200 GPU Server?

Config at a Glance

GPU: NVIDIA H200 NVL

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Memory

Storage

Networking

Egress pricing: 95th-percentile billing, not per-GB transfer

Security and Confidential Computing

HIPAA and regulatory compliance

Recommended Workloads

Large Language Model Fine-Tuning and Training

Large-Context and High-Throughput Inference

Retrieval-Augmented Generation (RAG) and Vector Workloads

HPC and Scientific Computing

Multi-GPU Cluster Workloads

Ready to Deploy an H200 GPU Server?

How the H200 Compares to Public Cloud GPU Instances

Deployment Options

Where to deploy

Get an H200 Quote

Related OpenMetal Answers

Related Hardware