In this article

This article walks through how to architect a confidential AI inference pipeline on dedicated bare metal using Intel TDX on OpenMetal’s XL v5 servers. We cover why public cloud confidential VMs create problems for this workload, what the XL v5 hardware gives you out of the box, how to set up Trust Domains in OpenStack, deploy vLLM inside them, handle model storage, and satisfy the attestation requirements that matter to auditors and compliance teams.


If you’re running inference on sensitive data like patient records, financial transactions, and personal identifiers the hardware isolation story matters as much as the software stack. Encryption in transit and at rest gets you partway there. Hardware-level isolation of the inference process itself is a different class of protection, and it’s what Intel TDX is designed to provide.

The problem is finding infrastructure that actually delivers it without introducing a new set of tradeoffs. Public cloud confidential VMs exist, but they come with constraints that are hard to work around at scale. Bare metal TDX on dedicated hardware is the cleaner solution, and with the OpenMetal XL v5, it’s available without any upgrade path or special ordering process.

Here’s how to build the stack.

Why Public Cloud Confidential VMs Fall Short for Inference Workloads

Public cloud providers offer TDX-based confidential VM options. Azure Confidential Computing is the most mature. On paper, this looks like the easy path. In practice, a few things get complicated quickly.

Shared tenancy and attestation trust chains. TDX’s attestation model proves that a specific Trust Domain is running on a specific hardware configuration in a specific state. On a dedicated bare metal server, that chain is clean. On a shared cloud host, the attestation you receive reflects the hypervisor layer the cloud provider controls, not a physical server you have any direct relationship with. For regulated workloads where auditors want proof of hardware-level isolation, that distinction matters. “We ran this in a confidential VM on a multi-tenant cloud host” is a harder story to tell than “we ran this on dedicated hardware with a clean attestation chain”.

Egress costs for model artifacts. Large language models are large. Moving a 70B parameter model between cloud storage and a confidential VM repeatedly gets expensive fast. Cloud egress pricing is metered and billed per GB. On dedicated bare metal with local NVMe and unmetered private bandwidth between servers, this is not a line item.

Inference latency variability. Confidential VMs on public cloud are still subject to noisy neighbor effects at the host level. Inference workloads are latency-sensitive. Running on a dedicated physical server with no other tenants removes that variable entirely.

Cost predictability. Cloud confidential VM pricing layers compute costs on top of an already expensive category of infrastructure. For teams running inference continuously rather than in bursts, fixed-cost dedicated hardware almost always wins on TCO past a certain utilization threshold.

What the XL v5 Gives You Out of the Box

The XL v5 runs on Intel’s Granite Rapids architecture (Intel 3 process node) and ships TDX-active at the BIOS level. There is no RAM upgrade required, no special configuration order, and no waiting for a custom build.

CPU: 2x Intel Xeon 6530P, 32 cores / 64 threads per socket (64 cores / 128 threads total). Granite Rapids is the first generation where TDX is production-stable across the full feature set.

RAM: 1TB DDR5-6400. All IMC channels are populated at 1DPC across both sockets, which is the actual Intel hardware requirement for TDX activation. Intel TDX requires all Slot 0 positions on every IMC channel to be populated for both CPUs. The XL v5 satisfies this at base configuration, which is why TDX is ready without any upgrade. The 1TB figure is a consequence of how OpenMetal stocks 64GB RDIMMs across 16 slots, not an arbitrary threshold.

Why does 1TB matter operationally? Three reasons:

TDX reserves per-page metadata before any Trust Domain launches. On a 1TB node this overhead is roughly 4GB, significant but manageable. At lower RAM totals, the overhead consumes a larger share of available memory before workloads start.

Each confidential VM carries its own EPC reservations, encryption metadata, and integrity-tracking overhead. Running multiple Trust Domains simultaneously is the normal production pattern for an inference service, and this overhead stacks. 1TB gives you the headroom to run several TDs without memory pressure.

NUMA locality under TDX: with TDX enabled, remote memory accesses across sockets are encrypted over UPI, adding latency. 1TB across two sockets gives 512GB per socket of local memory to work with. This matters for inference workloads that keep large model weights resident in memory.

Storage: 4x 6.4TB Micron 7500 MAX NVMe (25.6TB total). This is enough to keep multiple large model checkpoints on local storage. A 70B parameter model in float16 is roughly 140GB, so you can comfortably cache several models locally. The 7500 MAX is an enterprise-grade drive rated for sustained read/write throughput under continuous load.

Networking: Dual 10 Gbps NICs with 40 Gbps aggregate private bandwidth, unmetered between servers. Public bandwidth is 6 Gbps. Each customer gets dedicated VLANs at the infrastructure level.

Compliance posture: OpenMetal operates as a Business Associate under HIPAA and maintains a security and privacy framework grounded in ISO 27001:2022. This includes documented policies and procedures, incident response and breach notification processes, and periodic HIPAA/HITECH security assessments. For healthcare inference workloads, deploying on infrastructure with this compliance posture, combined with hardware-level attestation, meaningfully strengthens the technical safeguard story for auditors.

Architecting the Inference Stack Inside a Trust Domain

Trust Domain Provisioning on OpenStack

OpenMetal private clouds run OpenStack, and Nova handles the compute layer. Launching a TDX Trust Domain is done at the VM provisioning level. You specify a TDX-capable flavor and OpenStack handles the hardware configuration.

At a high level, the provisioning flow looks like this:

  1. Create a Nova flavor with TDX hardware properties enabled
  2. Launch a VM instance against that flavor on an XL v5 host
  3. The VM boots inside a hardware-isolated Trust Domain with memory encryption active
  4. The TD generates a TD Report that can be used for remote attestation

TDX operates at the VM level. Your inference workload, the model weights loaded into memory, and the intermediate activations during inference are all encrypted and isolated from anything else on the host, including the hypervisor.

Within the OpenStack environment, you can also create VPCs through OpenStack Projects. Each VPC gets its own logically isolated virtual network space, custom IP ranges, firewall rules, and security groups. Your confidential inference nodes can sit in a fully isolated network segment with controlled ingress and egress, which matters for regulated data pipelines where you need to demonstrate network-level isolation alongside hardware isolation.

Deploying vLLM Inside the Trust Domain

vLLM is the right framework for this workload. It handles continuous batching, paged attention for KV cache management, and efficient GPU-adjacent inference on CPU/memory-heavy hardware. For models in the 7B–70B range running on a 1TB, 128-thread system, vLLM’s memory management is a better fit than running raw HuggingFace Transformers inference.

Inside the Trust Domain, vLLM deployment is straightforward. The TD looks like a standard Linux VM to the application layer. TDX encryption is transparent to the software stack. You install vLLM, point it at your model artifacts, and it serves the OpenAI-compatible API endpoint as normal.

A few configuration points worth noting for this specific hardware:

Thread and memory allocation. The XL v5 has 128 threads across two NUMA nodes. vLLM’s tensor parallelism should be set to align with NUMA boundaries where possible to minimize cross-socket memory traffic. With TDX enabled and cross-socket UPI encrypted, keeping model shards within a single NUMA node is worth the extra configuration effort for latency-sensitive deployments.

KV cache sizing. With 1TB of RAM and TDX overhead accounted for, you have significant headroom for KV cache. vLLM’s memory utilization settings should be tuned based on your model size and expected concurrency. Consult the current vLLM documentation for specific flags, as these have evolved across releases.

Note: The vLLM configuration guidance in this section describes the general approach and relevant architectural considerations for running inference on TDX bare metal. Specific flag names and defaults should be verified against current vLLM release documentation before deploying.

Model serving isolation. If you’re serving multiple models or multiple tenants from the same physical host, run each in a separate Trust Domain rather than multiplexing inside a single TD. The per-TD overhead is real but manageable, and the isolation guarantee is the point of the whole architecture.

Memory Sizing for Multi-TD Workloads

Running more than one Trust Domain simultaneously is the normal production pattern for a real inference service. You may want redundancy, you may be serving different models, or you may be separating workloads by data classification level.

The overhead math for multi-TD on the XL v5:

  • PAMT metadata: approximately 4GB at 1TB total RAM, divided across active TDs
  • Per-TD EPC overhead: varies by TD size but plan for several GB per active Trust Domain
  • Practical headroom for workloads: roughly 900-950GB depending on number of active TDs

For a deployment running three concurrent TDs (one per model tier, for example), you have more than enough headroom. The architecture starts to feel constrained only if you’re trying to run many small TDs with large model footprints simultaneously, which is not the typical inference pattern.

Local NVMe vs. Ceph for Model Storage

The XL v5’s 25.6TB of local NVMe is the right starting point for most inference deployments. Model artifacts are read-heavy, read repeatedly, and benefit from low-latency local access. Keeping your model weights on local NVMe means no network round trip on model load, no dependency on external storage availability, and predictable latency.

The case for adding a Ceph cluster comes when you hit one of these situations:

Multiple inference nodes serving the same models. If you’re running several XL v5 nodes for availability or scale, maintaining synchronized model artifacts across local storage on each node gets unwieldy. A Ceph cluster shared across nodes gives you a single source of truth. OpenMetal offers dedicated Ceph storage clusters that integrate directly with the platform and connect via the same private network fabric.

Model versioning at scale. If you’re running frequent model updates across multiple nodes, Ceph’s object storage semantics (via OpenStack Swift, which runs on top of Ceph) make version management cleaner than syncing files across local drives.

Storage capacity beyond 25.6TB. The XL v5’s local NVMe is generous, but if your model library grows beyond what fits locally, Ceph gives you the expansion path.

For a single-node deployment or a small cluster with stable model artifacts, stick with local NVMe. It’s faster and simpler.

The Attestation Flow

This is the section that determines whether your architecture is actually defensible to auditors, not just technically sound.

TD Quote Generation

When a Trust Domain launches, it can generate a TD Report that contains measurements of the TD’s initial state: the software loaded into it, the configuration, and the hardware it’s running on. This report is signed by the TD’s private key, which never leaves the TD.

To make this report verifiable by someone outside the TD, it needs to be converted into a TD Quote. The Quoting Enclave (part of the TDX attestation infrastructure) takes the TD Report and signs it with a key that chains back to Intel’s attestation certificates. The result is a TD Quote: a portable, verifiable attestation of what is running on what hardware.

DCAP Verification

Intel’s Data Center Attestation Primitives (DCAP) is the verification stack. A relying party (your auditor, your compliance system, or a remote attestation service) takes the TD Quote and verifies:

  1. That the signature chains back to a valid Intel attestation certificate for a Granite Rapids processor
  2. That the hardware is in a known-good state (no security version number rollbacks, no debug mode enabled)
  3. That the TD measurements match what you expect, confirming the software inside the TD hasn’t been tampered with

The output of DCAP verification is a cryptographic proof that a specific, unmodified software stack is running inside a hardware-isolated Trust Domain on a specific class of hardware. That proof can be logged, timestamped, and presented to auditors.

What This Proves to Auditors and Compliance Teams

For HIPAA technical safeguard requirements, hardware attestation addresses the “access control” and “audit control” requirements in a way that software controls alone cannot. The attestation record demonstrates:

  • PHI processed inside the inference pipeline was isolated at the hardware level during processing
  • The software handling the PHI was in a known, verified state
  • The hardware configuration providing the isolation is documented and verifiable

OpenMetal operates as a Business Associate under HIPAA, with security practices grounded in ISO 27001:2022 and documented incident response procedures including breach notification. That infrastructure-layer compliance posture, combined with hardware attestation from the TD Quote, gives your compliance team concrete artifacts for the technical safeguard section that go beyond what contractual controls alone can provide.

For SOC 2 Type II audits, attestation logs support the Availability and Confidentiality trust service criteria, specifically around logical and physical access controls and encryption of sensitive data during processing. OpenMetal is currently pursuing SOC 2 certification; you can request SOC report information directly through the SOC Report Request Form.

Networking and Isolation

The network architecture around your confidential inference nodes matters as much as the TD isolation itself. Hardware attestation of the compute layer is weakened if the network path to and from the TD is poorly controlled.

Infrastructure-Level Isolation

At the infrastructure layer, OpenMetal assigns dedicated VLANs to each customer. These operate at the physical network level between servers. Your XL v5 nodes are not sharing a VLAN with any other customer’s infrastructure. This is distinct from the virtual networking layer: it’s isolation at the level of the network hardware itself.

VPC Isolation for Inference Nodes

Within your OpenStack environment, you can create VPCs through OpenStack Projects. Each VPC gets its own VXLAN overlay network running within your customer VLAN. For a confidential inference deployment, a clean architecture isolates your inference nodes in their own VPC, separate from your data ingestion pipeline and your output consumers.

Typical segmentation:

  • Inference VPC: Contains the TDX VMs running vLLM. No direct external access. Receives sanitized inputs from the processing layer, returns inference outputs.
  • Data processing VPC: Handles input preprocessing and output post-processing. Has controlled connectivity to the inference VPC via security group rules.
  • Management VPC: Handles access to OpenMetal Central and OpenStack Horizon. Separate from data paths.

OpenStack’s security groups let you define which ports and protocols are permitted between VPCs. The virtual router and NAT functionality handle connectivity where needed without exposing inference nodes directly.

VPN for External Connections

If your inference pipeline needs to receive data from external systems (a hospital EHR system, a financial data feed, a remote data processing environment), VPN-as-a-Service in OpenStack provides encrypted tunnels for those connections. This keeps sensitive data encrypted in transit from the source all the way to your inference pipeline, not just inside the OpenMetal environment.

FAQ

Does OpenMetal manage the TDX configuration, or do I control it?

TDX is active at the BIOS level on the XL v5. The Trust Domain lifecycle (provisioning, attestation, termination) is managed through OpenStack Nova, which you control. OpenMetal provides the hardware configuration and the OpenStack environment. The TD workloads are yours to manage.

Can I run non-TDX workloads alongside TDX workloads on the same cluster?

Yes. TDX is a per-VM feature in OpenStack. Standard VMs and confidential VMs can coexist on the same cluster. You would typically segment them by network VPC for compliance reasons, but the hardware supports mixed deployments.

What models does this architecture support?

Any model that runs on vLLM. For CPU inference on the XL v5, the practical range is roughly 7B to 70B parameters depending on precision and your concurrency requirements. Larger models can be accommodated with Ceph-backed storage and appropriate memory configuration.

Is attestation verification something I do, or does OpenMetal do it?

Attestation is performed by your systems or by a remote attestation service. OpenMetal provides the hardware that generates valid TD Quotes. Verification using Intel’s DCAP libraries is a client-side operation. If you need help setting up the verification pipeline, OpenMetal’s engineer-to-engineer support team can assist.

How does this architecture handle model updates without breaking attestation?

TD measurements are taken at launch time. When you update a model and restart the TD, new measurements are generated for the new state. Your attestation logs will show both the old and new states. This is expected behavior: attestation records a version history, not a single permanent state.

What does HIPAA compliance at the infrastructure level actually cover?

OpenMetal’s HIPAA compliance covers the infrastructure layer: physical security of data centers, access controls, and encryption capabilities. OpenMetal acts as a Business Associate, with documented policies, incident response procedures, and periodic HIPAA/HITECH security assessments. Your application-level HIPAA obligations (workforce training, policies, breach notification procedures) remain your responsibility. The infrastructure compliance removes a significant portion of the technical safeguard checklist.


OpenMetal’s XL v5 servers are available in Ashburn, VA. If you’re evaluating this architecture for a sensitive inference workload, contact the OpenMetal team for a technical conversation or to request access to the hardware specifications and compliance documentation.

Chat With Our Team

We’re available to answer questions and provide information.

Reach Out

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

 

 

 Read More on the OpenMetal Blog

Running Confidential AI Inference on Bare Metal TDX Servers

Jun 11, 2026

Running AI inference on sensitive data requires hardware-level isolation, not just software controls. This guide covers how to build a confidential inference pipeline on OpenMetal’s XL v5 using Intel TDX, including Trust Domain setup, vLLM deployment, attestation, and storage architecture.

How MSPs Can Win Clients With Compliance and Private Cloud

Apr 30, 2026

Enterprise clients in regulated industries are asking harder infrastructure questions than most MSPs are equipped to answer. This article covers where the Microsoft stack has limits for compliance workloads, what private cloud adds to an MSP’s portfolio, and how to start without overhauling your entire stack.

Is Your AI Infrastructure Ready for the EU AI Act?

Apr 28, 2026

EU AI Act compliance is more than a legal project, but an architecture decision. This article breaks down the four infrastructure requirements high-risk AI systems must meet, where public cloud creates compliance gaps, and how dedicated EU infrastructure with hardware-level isolation changes the picture.

Why Proof-of-Stake Validators Outgrow Their Hosting Provider

Apr 21, 2026

Professional PoS validator operations have specific infrastructure demands that general hosting and public cloud weren’t built for. This guide covers the five requirements that separate adequate from production-grade hosting, where public cloud falls short, and what to verify before signing with a provider.

Evaluating Intel TDX for Production Workloads in 2026

Mar 11, 2026

Intel TDX has matured past the proof-of-concept stage, but “production-ready” means different things depending on your workload and team. This guide covers real performance overhead figures, operational complexity, hardware options on OpenMetal v4 and v5, and when to adopt vs. wait.

Secret Network to Silicon: Building a True Confidential Computing Stack with Intel TDX on Bare Metal

Mar 01, 2026

Secret Network proves encrypted smart contracts work. Intel TDX on bare metal completes the confidential computing stack from application layer to silicon.

Adding Confidential Computing to Existing Infrastructure Without Starting Over

Feb 18, 2026

Many companies need confidential computing but can’t rebuild infrastructure from scratch. This guide shows how to add Intel TDX bare metal alongside existing OpenMetal or AWS/Azure/GCP setups. Covers workload prioritization, hybrid architecture patterns, cost analysis, and 2-3 month implementation timeline.

How Mid-Market SaaS Companies Use Intel TDX to Win Enterprise Deals

Feb 12, 2026

Enterprise RFPs increasingly require confidential computing capabilities. This guide shows how mid-market SaaS companies use Intel TDX to answer security questionnaires, differentiate from competitors, and close six-figure deals. Includes ideal scenarios, ROI calculations, pricing strategies, and implementation steps.

How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

Dec 03, 2025

Overcome the trust barrier in enterprise AI. This guide details how to deploy vector databases within Intel TDX Trust Domains on OpenMetal. Learn how Gen 5 hardware isolation and private networking allow you to run RAG pipelines on sensitive data while keeping it inaccessible to the provider.

Benchmarking Intel Xeon Gen 5 Performance for High Density Workloads

Nov 26, 2025

Maximize density with 5th Gen Intel Xeon. We benchmark OpenMetal’s Large V4 servers to reveal 21% better compute, 14x faster AI inference via AMX, and secure confidential computing with TDX. Eliminate the GPU tax and future-proof I/O.