The OpenMetal H200 is a single-tenant bare metal GPU server built on the NVIDIA H200 NVL (141GB HBM3e) paired with dual Intel Xeon 6530P (Granite Rapids) processors. It replaces the H100 in OpenMetal’s lineup, carrying 50% more GPU memory per card (141GB vs 94GB) and higher memory bandwidth, so larger models and longer context windows fit in a single GPU’s memory. Like every OpenMetal server, it ships with full root access, no shared tenancy, and fixed monthly pricing — you get the whole GPU, not a time-sliced fraction of one.

Key Takeaways

  • 141GB HBM3e per GPU fits 70B-class models and long-context inference in a single card’s memory, avoiding the cross-GPU sharding that H100’s 80–94GB forces.
  • 1TB DDR5-6400 system memory keeps datasets, embeddings, and data-loader pipelines resident in host RAM so the GPU stays fed during training and batch inference.
  • Standard CUDA runs unlicensed; NVIDIA AI Enterprise (NVAIE) available — PyTorch, vLLM, and TensorRT-LLM need no NVIDIA license; NVAIE’s supported enterprise stack (NIM, NeMo) is available for H200 deployments. Contact OpenMetal for current NVAIE options.
  • Single-tenant bare metal means the full GPU, full PCIe bandwidth, and direct hardware access — no hypervisor overhead, no noisy-neighbor contention on the accelerator.
  • Fixed monthly pricing with included egress — no per-GPU-hour metering surprises and no per-GB data-transfer bill when you pull model weights or training data back out.
  • Deploy one card or scale to a multi-GPU cluster — start with a single H200 server and grow into a dedicated GPU cluster, or attach H200 nodes to an existing OpenMetal Hosted Private Cloud or bare metal footprint. See GPU server pricing.

Ready to Deploy an H200 GPU Server?

Tell us about your workload and we’ll help you configure the right deployment — a single H200, a dedicated multi-GPU cluster, or H200 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get a H200 Quote   Schedule a Consultation

Config at a Glance

ComponentSpecification
GPUNVIDIA H200 NVL (PCIe), 141GB HBM3e per GPU, 1–2 GPUs per server
GPU Memory Bandwidth4.8 TB/s per GPU (HBM3e)
GPU Max Board Power600W per GPU
Processor2x Intel Xeon 6530P (Granite Rapids, Intel 3)
Total Cores / Threads64 cores / 128 threads
Base / Max Turbo Frequency2.3 GHz / 4.1 GHz (3.7 GHz all-core turbo)
L3 Cache144 MB per CPU
TDP225W per CPU
System Memory1TB DDR5-6400 (16 of 32 DIMM slots populated; upgradeable to 2TB)
Boot Storage2x 960GB NVMe (RAID 1)
Data Storage1x 6.4TB Micron 7500 MAX NVMe (PCIe Gen4, 3 DWPD)
Max Drive Bays24x 2.5″ NVMe/SATA/SAS
Private Bandwidth40 Gbps (4x 10 Gbps LACP-bonded)
Public Bandwidth10 Gbps
PCIePCIe 5.0, 88 lanes per processor
Confidential ComputingIntel SGX available (CPU); Intel TDX not combinable with GPU passthrough
SoftwareStandard CUDA runs unlicensed; NVIDIA AI Enterprise (NVAIE) available — contact OpenMetal for options
AvailabilityAvailable now in US-East (Ashburn, VA); advance booking for other regions
PricingBuilt to order — contact OpenMetal for a quote (fixed monthly, included egress; no per-GPU-hour metering)

Bare Metal GPU Server -- NVIDIA H200 NVL -- Dual Intel Xeon 6530P, 1TB DDR5, 141 architecture diagram

gpu-server-h200 component architecture

GPU: NVIDIA H200 NVL

The H200 NVL is NVIDIA’s Hopper-generation data center GPU in PCIe form factor, carrying 141GB of HBM3e memory per card — roughly 50% more than the H100 NVL’s 94GB and nearly double the H100 SXM’s 80GB. For OpenMetal customers, the practical effect is model-fit: a 70B-parameter model in 16-bit precision needs ~140GB of weights, which now lands on a single H200 rather than requiring two H100s with the latency and complexity of tensor-parallel sharding. Memory bandwidth is 4.8 TB/s, which directly raises tokens-per-second on memory-bound inference.

OpenMetal deploys the H200 as a true bare metal device: the GPU is passed through directly with no hypervisor layer between your code and the accelerator, so you get full PCIe 5.0 bandwidth and direct access to NVIDIA’s driver stack. Each server supports 1 or 2 H200 cards; for NVLink-bridged two-card configurations the GPUs share memory over the high-speed bridge. GPU-memory pooling is available between GPUs within the same server; across nodes, GPUs communicate over the private network rather than a shared GPU-memory fabric, so size per-server GPU memory for workloads that need tight GPU-to-GPU coupling.

Processor: Dual Intel Xeon 6530P (Granite Rapids)

Each H200 server pairs the GPU with two Intel Xeon 6530P processors (Granite Rapids, Intel 3 process), for 64 cores / 128 threads total at 2.3 GHz base / 4.1 GHz turbo, with 144 MB of L3 cache per socket and 88 PCIe 5.0 lanes per processor. The high lane count matters for GPU servers: it provides full-bandwidth PCIe 5.0 to the GPU plus the NVMe data drive without contention. The Granite Rapids cores also carry Intel AMX and AVX-512, useful for CPU-side data preprocessing, tokenization, and embedding pipelines that feed the GPU. OpenMetal selected the 6530P to keep the host from bottlenecking the accelerator during data-loading-heavy training runs. See the Intel Xeon 6530P product page for full CPU detail.

Memory

The H200 server ships with 1TB of DDR5-6400 across 16 of 32 DIMM slots (one DIMM per channel, both sockets), with 16 open slots to upgrade to 2TB. With 8 memory channels per socket at 6400 MT/s, the platform delivers high host-memory bandwidth to keep training data and inference batches resident in RAM. For GPU workloads this host memory acts as the staging tier: dataset shards, tokenized corpora, vector indexes, and model checkpoints live in system RAM and stream to the GPU over PCIe 5.0. ECC is standard for the data integrity that long training runs require. The practical max at full DDR5-6400 speed is 2TB (all 32 slots with 64GB RDIMMs); a 4TB ceiling exists with 128GB RDIMMs but is rarely chosen.

Storage

OpenMetal separates boot and data storage on every server. The H200 boots from 2x 960GB NVMe drives in RAID 1, isolating the operating system from your data drives so a data-volume change never risks the boot environment — see boot and data drive isolation. The data tier is a 6.4TB Micron 7500 MAX NVMe SSD (PCIe Gen4, 232-layer 3D TLC, 3 DWPD mixed-use endurance), with capacity to add drives up to the 24-bay chassis limit for larger datasets and checkpoint storage.

MetricMicron 7500 MAX (6.4TB)
Sequential Read7,000 MB/s
Sequential Write5,900 MB/s
Random Read1,100,000 IOPS
Random Write400,000 IOPS
Read Latency (typical)70 µs
Write Latency (typical)15 µs
Endurance3 DWPD (35,040 TBW)
Warranty5 years

High sequential read throughput matters for GPU training: streaming sharded datasets and loading multi-hundred-GB checkpoints is read-bound, and 7 GB/s per drive keeps the data loader ahead of the GPU.

Networking

Every H200 server has 40 Gbps of private bandwidth delivered as 4x 10 Gbps uplinks in an LACP bond, plus 10 Gbps of public bandwidth. The private network is the path for multi-node GPU work — distributed training, parameter servers, and pulling datasets from OpenMetal storage nodes — and east-west traffic between your servers is not metered. OpenMetal’s base network SLA is 99.96%, with measured performance exceeding 99.99% from 2022 through 2026. DDoS protection of up to 10 Gbps per IP is included. See LACP network bonding.

Egress pricing: 95th-percentile billing, not per-GB transfer

OpenMetal bills public network usage on a 95th-percentile model with a generous included allotment, not per-GB like the hyperscalers. For GPU workloads this is a material difference: pulling trained model weights, exporting datasets, or serving inference responses to end users does not generate the per-GB egress bill that AWS, GCP, and Azure charge — where data-transfer-out on GPU instances frequently rivals the compute cost itself.

Security and Confidential Computing

The H200 runs as a single-tenant bare metal server — physical isolation, not a shared hypervisor — which is the foundational security property for sensitive training data and proprietary model weights. The Xeon 6530P supports Intel SGX for application-level enclaves and TME-MK total memory encryption. Hardware security features include AES-NI, Intel Boot Guard, and Control-Flow Enforcement Technology (CET).

Important: Intel TDX (Trust Domain Extensions) and GPU passthrough cannot be combined in a single trust boundary on this platform. Customers requiring confidential VMs (TDX) should use OpenMetal’s non-GPU bare metal servers; GPU workloads run on the H200 as single-tenant bare metal with physical isolation rather than TDX memory encryption. 

HIPAA and regulatory compliance

OpenMetal is HIPAA compliant at the organizational level and offers Business Associate Agreements (BAAs). The H200 is currently deployed in Ashburn, Virginia (NTT DATA VA1), whose facility-operator certifications include SOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, and HIPAA. Facility certifications are held by the facility operator (NTT), not by OpenMetal; OpenMetal’s HIPAA posture is organizational. Healthcare and regulated AI workloads — training on PHI, clinical inference — can run on H200 servers hosted in the HIPAA-compliant Ashburn facility under an OpenMetal BAA.

Recommended Workloads

Large Language Model Fine-Tuning and Training

The 141GB HBM3e lets a single H200 hold 70B-parameter models in 16-bit precision for full fine-tuning, or much larger models with LoRA/QLoRA, without tensor-parallel sharding across cards. Pair two H200s with NVLink for 282GB of pooled GPU memory. Frameworks: PyTorch FSDP, DeepSpeed, Hugging Face Transformers, NVIDIA NeMo (available with NVAIE).

Large-Context and High-Throughput Inference

Memory-bound inference benefits directly from the H200’s 4.8 TB/s bandwidth and 141GB capacity: longer context windows, larger KV caches, and bigger batch sizes per card. Serve with NVIDIA NIM microservices (available with NVAIE), vLLM, or TensorRT-LLM. A single H200 serves models that would require two H100s.

Retrieval-Augmented Generation (RAG) and Vector Workloads

Run the embedding model, vector index, and generation model together: 1TB of host RAM holds large vector indexes resident while the H200 handles embedding and generation, with the NVMe data tier providing fast index persistence.

HPC and Scientific Computing

Hopper’s FP64 and tensor cores suit computational chemistry, CFD, genomics, and physics simulation. The dual Xeon 6530P host with AMX/AVX-512 handles the CPU-bound portions, and 40 Gbps private networking links multiple H200 nodes for MPI workloads.

Multi-GPU Cluster Workloads

Scale beyond a single server into a dedicated OpenMetal GPU cluster — same-GPU (all H200) or mixed (H200 + RP6000) — connected over the 40 Gbps private mesh for distributed training and large-scale inference serving. (Multi-node clusters communicate over the private network; GPU-memory pooling is within a node, so distributed work uses data and pipeline parallelism across nodes.)

“With v5 we modernized the foundation of our bare metal and private cloud catalog. Adding the RP6000 and H200 was the natural next step. Customers running AI and HPC workloads get fully dedicated GPUs on the same modern Xeon 6000 platform, with transparent monthly billing and infrastructure they actually control, not throttled, metered slices of someone else’s cluster.”

Jamie Tischart, CTO, OpenMetal

Ready to Deploy an H200 GPU Server?

Tell us about your workload and we’ll help you configure the right deployment — a single H200, a dedicated multi-GPU cluster, or H200 nodes attached to your existing OpenMetal cloud or bare metal footprint.

Get a H200 Quote   Schedule a Consultation

How the H200 Compares to Public Cloud GPU Instances

Hyperscaler GPU instances (AWS P5, GCP A3, Azure ND-series) deliver H100/H200-class accelerators on a per-GPU-hour metered model with shared-tenancy infrastructure and per-GB egress. OpenMetal’s H200 is structurally different: dedicated single-tenant hardware, fixed monthly pricing, and included egress. For sustained 24/7 training or always-on inference, the fixed-cost model is typically far cheaper than metered GPU-hours. Metered pricing also charges an “idle silicon tax” — every GPU-hour bundles the provider’s elasticity and idle-capacity risk into the rate, so a steady high-utilization training job effectively subsidizes other tenants’ burst headroom. On a dedicated H200, running the card at 100% for days costs no more than leaving it idle.

When public cloud GPU is the better fit: spiky, scale-to-zero inference; short experimental runs measured in hours; or deep integration with a hyperscaler’s managed ML services (SageMaker, Vertex AI). A detailed H200-vs-cloud cost comparison is planned as a companion page.

Deployment Options

The H200 can be deployed three ways:

  • Dedicated GPU server — a single H200 (or dual-GPU) bare metal server with full root access, IPMI, and fixed monthly pricing. Best for single-node training, fine-tuning, and inference serving.
  • Dedicated GPU cluster — multiple GPU nodes (all-H200 or mixed with RP6000) on a private 40 Gbps mesh for distributed training and scaled inference. Built to order.
  • Attached to existing infrastructure — add H200 nodes to an existing OpenMetal Hosted Private Cloud or bare metal deployment, putting GPU acceleration on the same private network as your existing compute and storage.

Where to deploy

The H200 is available now in Ashburn, Virginia (US-East), hosted in a Tier III, HIPAA-compliant NTT facility. Advance reservations are available for OpenMetal’s other regions — Los Angeles, Amsterdam, and Singapore — for customers planning capacity ahead of deployment.

LocationRegionCertifications (facility operator)Location Page
Ashburn, VAUS-EastSOC 1/2 Type II, ISO 27001, ISO 50001, PCI DSS, NIST 800-53 HIGH, HIPAAAshburn facility specs

Proof of Concept clusters are available for testing. Ramp pricing is available for migrations from other providers, with fixed monthly pricing once deployed. See GPU server pricing.

Get an H200 Quote

Ready to deploy? Tell us about your AI/ML infrastructure needs and we’ll provide a custom quote for the NVIDIA H200 — as a single GPU server, a dedicated GPU cluster, or GPU nodes attached to an existing OpenMetal deployment.

  • Single GPU server: One or two H200 cards with full root access and IPMI
  • GPU cluster: Multi-node deployments (all-H200 or mixed-GPU) on a private 40 Gbps mesh
  • Attached GPU: Add H200 capacity to your existing Hosted Private Cloud or bare metal footprint
  • Custom configurations: RAM upgrades to 2TB, additional NVMe drives, multi-GPU NVLink

All deployments include fixed monthly pricing, included egress, a 99.96%+ network SLA, and DDoS protection; NVIDIA AI Enterprise (NVAIE) is available for H200 deployments (contact OpenMetal for current options). Ramp pricing is available for migrations.

Related OpenMetal Answers

The following knowledgebase articles cover specific questions about the H200 and related topics:

  • What is the difference between the NVIDIA H200 and H100?
  • How much GPU memory does the OpenMetal H200 have?
  • Can I run a 70B parameter LLM on a single OpenMetal H200?
  • Does the OpenMetal H200 include NVIDIA AI Enterprise?
  • Can I build a multi-GPU cluster with OpenMetal H200 servers?
  • Can I add GPU servers to my existing OpenMetal cloud or bare metal deployment?
  • Can I run Intel TDX confidential computing on an OpenMetal GPU server?
  • Where are OpenMetal H200 GPU servers available?
  • How does OpenMetal GPU pricing compare to AWS GPU instances?

Product specifications, pricing, and availability may change due to market conditions and other factors. For the most current information, please contact the OpenMetal team directly.