In this article

  • The Hardware Root of Trust: Intel TDX
  • The Architecture Setup
  • The Economic Argument for Private RAG
  • Reference Architecture: What This Looks Like in Production
  • Why This Matters
  • Ready to Secure Your AI Workload?

If you’re a CTO or Tech Lead in a regulated industry like FinTech or Healthcare, you’re likely under pressure to deploy Generative AI. Your legal team wants to query 10 years of M&A documents using an LLM. Your clinical staff wants to chat with patient records to surface treatment patterns. Your IP team wants to search through proprietary research.

But your compliance team immediately asks: where does that data live when it’s being processed?

There’s a massive blocker stopping you, and it’s a valid one: Trust.

To build a RAG (Retrieval-Augmented Generation) pipeline, you have to take your most sensitive data, chunk it, embed it, and load it into a vector database. If you do this on a standard public cloud, even with encryption at rest, there’s a point of vulnerability. When that data is loaded into memory for querying, it’s technically accessible by the cloud provider’s hypervisor.

For a CISO, that’s a non-starter. You cannot hand the keys to your kingdom to a third-party provider, no matter how good their SLA is.

No need to abandon all hope! You just have to find the right the hardware architecture. By deploying your vector database inside an Intel TDX Trust Domain on OpenMetal, you can cryptographically guarantee that only you have access to that data in memory. Not the hypervisor, and definitely not us.

Here’s how you architect that pipeline using OpenMetal’s Gen 4 (V4) infrastructure.

The Hardware Root of Trust: Intel TDX

The core of this architecture relies on Intel Trust Domain Extensions (TDX), available on our V4 server line with Intel Xeon Scalable processors (4th Gen).

What TDX actually does: It allows you to create a “Trust Domain” (TD), a hardware-isolated virtual machine where the CPU itself enforces memory encryption at the silicon level.

Why hypervisor access is the vulnerability: In a standard virtualization environment, the hypervisor (the software managing the VMs) has full privileges over the guest VMs. It can theoretically dump the memory of any guest. This creates a trust boundary problem. You’re relying on policies and access controls rather than mathematical impossibility.

How TDX solves it: The CPU encrypts the Trust Domain’s memory with a key that only the TD itself can access. If the hypervisor (or a bad actor with physical access to the server) tries to read that memory, all they see is ciphertext. You can cryptographically verify the measurement of your Trust Domain through Remote Attestation before sending any data to it, proving the software stack hasn’t been tampered with.

This creates a “Confidential Computing” environment where data remains encrypted even while being processed, satisfying HIPAA’s encryption requirements, GDPR’s data processing restrictions, and SOC 2 Type II controls for memory protection.

The Architecture Setup

Here’s what this looks like in practice:

Data Flow:

  1. Source documents → Embedding model (in separate Trust Domain or secure environment)
  2. Embeddings → Vector database (inside TDX Trust Domain) via private VLAN
  3. Query → Vector DB retrieval → Context assembly → LLM inference
  4. Response path traverses only private network until final delivery

To build this confidential RAG pipeline, you need three main components: the compute power, the isolation, and the networking to keep it private.

Step 1: Hardware Selection

Vector databases are memory-hungry. For a production-grade RAG pipeline, start with our XL V4 or XXL V4 servers, which support TDX out of the box.

XL V4 Specifications:

  • Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
  • RAM: 1 TB DDR5 ECC (critical for keeping vector indices in memory for low-latency retrieval)
  • Storage: 4x 6.4TB Micron 7450 MAX NVMe drives (up to 10 working drives)
  • Network: Dual 10 Gbps private links (20 Gbps total)
  • Egress: 4 Gbps included

XXL V4 Specifications:

  • Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
  • RAM: 2 TB DDR5 ECC
  • Storage: 6x 6.4TB Micron 7450 MAX NVMe drives (up to 24 working drives)
  • Network: Dual 10 Gbps private links (20 Gbps total)
  • Egress: 4 Gbps included

Note: Medium V4 and Large V4 can also support TDX but require upgrading to 1 TB RAM.

Vector databases like Weaviate, Qdrant, or Milvus often keep indices in memory for sub-100ms p99 latency. This RAM headroom prevents performance degradation under load, especially when you’re serving concurrent queries across multiple embeddings.

Step 2: Trust Domain Configuration

On OpenMetal, you aren’t fighting “noisy neighbors” for resources since you have dedicated hardware. You’ll configure the BIOS to enable TDX and then spin up your VM as a Trust Domain.

Inside this CPU-level isolated environment, you deploy your vector database of choice (Qdrant, Milvus, or Weaviate). The hardware enforcement means you can prove via Remote Attestation where:

  1. The software stack matches your expected measurements
  2. It’s running inside a secure enclave
  3. No unauthorized code has been injected

This moves the conversation from “we trust the provider not to look” to “the provider physically cannot look”.

Step 3: Network Isolation

Isolation is useless if the data is intercepted on its way to the database. This is where OpenMetal’s networking architecture becomes critical.

We provide dual 10 Gbps private links per server (20 Gbps total). When you set up your pipeline, utilize our OpenStack VPC Private Networking to create a VXLAN overlay that isolates the ingestion traffic completely from the public internet.

How it works:

  • Your embedding servers talk to your Trust Domain Vector DB over a private, unmetered VLAN
  • The data never traverses a public route
  • The database sits in a memory space that’s opaque to the outside world
  • All inter-server communication stays on dedicated private links

This architecture satisfies the network segmentation requirements in frameworks like NIST 800-53 and PCI DSS.

The Economic Argument for Private RAG

Beyond security, there’s a cost reality to RAG pipelines—they’re “chatty.” Every query involves sending context back and forth between your inference engine and your database. If you build this on a hyperscaler, the egress fees for moving data between services can destroy your budget.

OpenMetal models this differently:

Private traffic is unmetered between your servers. Your embedding service can talk to your vector DB all day without generating a single egress charge.

Public egress is generous: We include 4 Gbps per server for XL V4 and XXL V4, which aggregates across your cluster.

If you have a 3-server cluster of XL V4s, you’ll have 12 Gbps of egress throughput included before we even start looking at 95th percentile billing. This allows for the kind of bursty traffic AI applications generate, without the billing surprises you get elsewhere.

Compare this to AWS, where you pay $0.09/GB for egress after the first GB. A RAG pipeline serving 1 TB/month in query responses would cost $90 in egress alone before compute, before storage, before data transfer between AZs.

Reference Architecture: What This Looks Like in Production

Deployment topology for a HIPAA-compliant healthcare RAG pipeline:

  1. Ingestion Layer: Separate V4 server running document preprocessing and embedding generation (also in TDX if documents are PHI)
  2. Vector Database Layer: XL V4 running Qdrant inside TDX Trust Domain
  3. Inference Layer: XL V4 running your LLM (Llama, Mistral, or commercial API calls that exit via encrypted tunnel)
  4. Private Network: All three communicate over VXLAN with no public internet exposure
  5. Access Layer: Application server with TLS termination and authentication sits at the edge

Result: Patient records are chunked and embedded in a secure environment, stored in encrypted memory that the hypervisor cannot access, queried over a private network, and the vector DB never exposes data outside the Trust Domain.

Why This Matters

This architecture solves the specific anxiety holding back enterprise AI adoption in regulated industries. You get:

  • The performance of Gen 4 Bare Metal: Dedicated CPUs, no noisy neighbors, predictable latency
  • The isolation of Intel TDX: CPU-level memory encryption that’s mathematically verifiable
  • The network privacy of a dedicated VPC: Unmetered private traffic between your components
  • The cost predictability: No surprise egress bills from chatty AI workloads

It’s a setup that satisfies the CISO, the General Counsel, and the AI engineering team all at once.

How this compares to alternatives:

  • AWS Nitro Enclaves: Limited to specific instance types, still running on shared hardware with potential side-channel concerns
  • Azure Confidential Computing: VMs with SGX or SEV-SNP, but you’re still in a multi-tenant environment with egress charges
  • OpenMetal TDX: Full bare metal performance, complete hardware isolation, transparent networking, predictable costs

Ready to Secure Your AI Workload?

Architecture LockIf you want to verify the TDX capabilities or benchmark a vector database on our Gen 4 hardware, let’s get you access.

Schedule a 30-minute technical walkthrough where we’ll show you TDX attestation in action, discuss your specific compliance requirements (HIPAA, SOC 2, GDPR, PCI DSS), and design a proof of concept deployment tailored to your use case.

Contact our team to get started, or explore our V4 server specifications to see which configuration fits your data volume and query patterns.

Chat With Our Team

We’re available to answer questions and provide information.

Reach Out

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

 

 

 Read More on the OpenMetal Blog

How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

Dec 03, 2025

Overcome the trust barrier in enterprise AI. This guide details how to deploy vector databases within Intel TDX Trust Domains on OpenMetal. Learn how Gen 5 hardware isolation and private networking allow you to run RAG pipelines on sensitive data while keeping it inaccessible to the provider.

Benchmarking Intel Xeon Gen 5 Performance for High Density Workloads

Nov 26, 2025

Maximize density with 5th Gen Intel Xeon. We benchmark OpenMetal’s Large V4 servers to reveal 21% better compute, 14x faster AI inference via AMX, and secure confidential computing with TDX. Eliminate the GPU tax and future-proof I/O.

From Spectre to Sanctuary: How CPU Vulnerabilities Sparked the Confidential Computing Revolution

Oct 29, 2025

The 2018 Spectre, Meltdown, and Foreshadow vulnerabilities exposed fundamental CPU flaws that shattered assumptions about hardware isolation. Learn how these attacks sparked the confidential computing revolution and how OpenMetal enables Intel TDX on enterprise bare metal infrastructure.

Why Grant-Funded Orgs Prefer Fixed-Price Confidential Private Clouds Over Hyperscalers

Oct 15, 2025

Research institutions, universities, and NGOs face strict grant budgets and data protection requirements. Variable hyperscaler pricing creates financial risk that grant cycles can’t absorb. Fixed-price confidential private clouds provide transparent costs, hardware-level security, and compliance support.

Confidential Computing as Regulators Tighten Cross-Border Data Transfer Rules

Oct 10, 2025

Cross-border data transfer regulations are tightening globally. Confidential computing provides enterprises with verifiable, hardware-backed protection for sensitive workloads during processing. Learn how CTOs and CISOs use Intel TDX, regional infrastructure, and isolated networking to meet GDPR, HIPAA, and PCI-DSS requirements.

Confidential Cloud Storage with Ceph: Securing Sensitive Data at Scale

Oct 01, 2025

Confidential cloud storage with Ceph combines distributed architecture, hardware-backed security, and OpenStack orchestration to protect sensitive data at scale. Learn how OpenMetal delivers secure storage for regulated industries.

Confidential Workloads on Bare Metal with Private Cloud: Leveraging OpenStack for Security and Control

Sep 25, 2025

Learn how bare metal infrastructure with private cloud powered by OpenStack delivers the security, compliance, and control that confidential workloads require – from healthcare to finance to blockchain applications.

Confidential Computing for Healthcare AI: Training Models on PHI Without Public Cloud Risk

Sep 17, 2025

Healthcare organizations can now train AI models on sensitive patient data without exposing it to public cloud vulnerabilities. Confidential computing creates hardware-protected environments where PHI remains secure during processing, enabling breakthrough AI development while maintaining HIPAA compliance and reducing regulatory overhead.

GPU-Accelerated Blockchain Workloads: Bare Metal Power for AI-Driven Smart Contracts

Sep 16, 2025

Discover how GPU acceleration transforms blockchain applications with AI-driven smart contracts. Learn why bare metal infrastructure provides the performance, security, and cost predictability needed for next-generation blockchain workloads that integrate machine learning and decentralized computing.

Confidential Computing for High-Performance Workloads: Balancing Security, GPUs, and Speed

Sep 10, 2025

Learn how confidential computing enables secure, high-performance workloads by combining Intel TDX hardware isolation with GPU acceleration. Explore real-world applications in AI training, blockchain validation, and financial analytics while maintaining data confidentiality and computational speed.