How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

Resources » Blog » How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

In this article

The Hardware Root of Trust: Intel TDX
The Architecture Setup
The Economic Argument for Private RAG
Reference Architecture: What This Looks Like in Production
Why This Matters
Ready to Secure Your AI Workload?

If you’re a CTO or Tech Lead in a regulated industry like FinTech or Healthcare, you’re likely under pressure to deploy Generative AI. Your legal team wants to query 10 years of M&A documents using an LLM. Your clinical staff wants to chat with patient records to surface treatment patterns. Your IP team wants to search through proprietary research.

But your compliance team immediately asks: where does that data live when it’s being processed?

There’s a massive blocker stopping you, and it’s a valid one: Trust.

To build a RAG (Retrieval-Augmented Generation) pipeline, you have to take your most sensitive data, chunk it, embed it, and load it into a vector database. If you do this on a standard public cloud, even with encryption at rest, there’s a point of vulnerability. When that data is loaded into memory for querying, it’s technically accessible by the cloud provider’s hypervisor.

For a CISO, that’s a non-starter. You cannot hand the keys to your kingdom to a third-party provider, no matter how good their SLA is.

No need to abandon all hope! You just have to find the right the hardware architecture. By deploying your vector database inside an Intel TDX Trust Domain on OpenMetal, you can cryptographically guarantee that only you have access to that data in memory. Not the hypervisor, and definitely not us.

Here’s how you architect that pipeline using OpenMetal’s Gen 4 (V4) infrastructure.

The Hardware Root of Trust: Intel TDX

The core of this architecture relies on Intel Trust Domain Extensions (TDX), available on our V4 server line with Intel Xeon Scalable processors (4th Gen).

What TDX actually does: It allows you to create a “Trust Domain” (TD), a hardware-isolated virtual machine where the CPU itself enforces memory encryption at the silicon level.

Why hypervisor access is the vulnerability: In a standard virtualization environment, the hypervisor (the software managing the VMs) has full privileges over the guest VMs. It can theoretically dump the memory of any guest. This creates a trust boundary problem. You’re relying on policies and access controls rather than mathematical impossibility.

How TDX solves it: The CPU encrypts the Trust Domain’s memory with a key that only the TD itself can access. If the hypervisor (or a bad actor with physical access to the server) tries to read that memory, all they see is ciphertext. You can cryptographically verify the measurement of your Trust Domain through Remote Attestation before sending any data to it, proving the software stack hasn’t been tampered with.

This creates a “Confidential Computing” environment where data remains encrypted even while being processed, satisfying HIPAA’s encryption requirements, GDPR’s data processing restrictions, and SOC 2 Type II controls for memory protection.

The Architecture Setup

Here’s what this looks like in practice:

Data Flow:

Source documents → Embedding model (in separate Trust Domain or secure environment)
Embeddings → Vector database (inside TDX Trust Domain) via private VLAN
Query → Vector DB retrieval → Context assembly → LLM inference
Response path traverses only private network until final delivery

To build this confidential RAG pipeline, you need three main components: the compute power, the isolation, and the networking to keep it private.

Step 1: Hardware Selection

Vector databases are memory-hungry. For a production-grade RAG pipeline, start with our XL V4 or XXL V4 servers, which support TDX out of the box.

XL V4 Specifications:

Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
RAM: 1 TB DDR5 ECC (critical for keeping vector indices in memory for low-latency retrieval)
Storage: 4x 6.4TB Micron 7450 MAX NVMe drives (up to 10 working drives)
Network: Dual 10 Gbps private links (20 Gbps total)
Egress: 6 Gbps included

XXL V4 Specifications:

Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
RAM: 2 TB DDR5 ECC
Storage: 6x 6.4TB Micron 7450 MAX NVMe drives (up to 24 working drives)
Network: Dual 10 Gbps private links (20 Gbps total)
Egress: 10 Gbps included

Note: Medium V4 and Large V4 can also support TDX but require upgrading to 1 TB RAM.

Vector databases like Weaviate, Qdrant, or Milvus often keep indices in memory for sub-100ms p99 latency. This RAM headroom prevents performance degradation under load, especially when you’re serving concurrent queries across multiple embeddings.

Step 2: Trust Domain Configuration

On OpenMetal, you aren’t fighting “noisy neighbors” for resources since you have dedicated hardware. You’ll configure the BIOS to enable TDX and then spin up your VM as a Trust Domain.

Inside this CPU-level isolated environment, you deploy your vector database of choice (Qdrant, Milvus, or Weaviate). The hardware enforcement means you can prove via Remote Attestation where:

The software stack matches your expected measurements
It’s running inside a secure enclave
No unauthorized code has been injected

This moves the conversation from “we trust the provider not to look” to “the provider physically cannot look”.

Step 3: Network Isolation

Isolation is useless if the data is intercepted on its way to the database. This is where OpenMetal’s networking architecture becomes critical.

We provide dual 10 Gbps private links per server (20 Gbps total). When you set up your pipeline, utilize our OpenStack VPC Private Networking to create a VXLAN overlay that isolates the ingestion traffic completely from the public internet.

How it works:

Your embedding servers talk to your Trust Domain Vector DB over a private, unmetered VLAN
The data never traverses a public route
The database sits in a memory space that’s opaque to the outside world
All inter-server communication stays on dedicated private links

This architecture satisfies the network segmentation requirements in frameworks like NIST 800-53 and PCI DSS.

The Economic Argument for Private RAG

Beyond security, there’s a cost reality to RAG pipelines—they’re “chatty.” Every query involves sending context back and forth between your inference engine and your database. If you build this on a hyperscaler, the egress fees for moving data between services can destroy your budget.

OpenMetal models this differently:

Private traffic is unmetered between your servers. Your embedding service can talk to your vector DB all day without generating a single egress charge.

Public egress is generous: We include 6 and 10 Gbps per server for XL V4 and XXL V4, respectively, which aggregates across your cluster.

If you have a 3-server cluster of XL V4s, you’ll have 18 Gbps of egress throughput included before we even start looking at 95th percentile billing. This allows for the kind of bursty traffic AI applications generate, without the billing surprises you get elsewhere.

Compare this to AWS, where you pay $0.09/GB for egress after the first GB. A RAG pipeline serving 1 TB/month in query responses would cost $90 in egress alone before compute, before storage, before data transfer between AZs.

Reference Architecture: What This Looks Like in Production

Deployment topology for a HIPAA-compliant healthcare RAG pipeline:

Ingestion Layer: Separate V4 server running document preprocessing and embedding generation (also in TDX if documents are PHI)
Vector Database Layer: XL V4 running Qdrant inside TDX Trust Domain
Inference Layer: XL V4 running your LLM (Llama, Mistral, or commercial API calls that exit via encrypted tunnel)
Private Network: All three communicate over VXLAN with no public internet exposure
Access Layer: Application server with TLS termination and authentication sits at the edge

Result: Patient records are chunked and embedded in a secure environment, stored in encrypted memory that the hypervisor cannot access, queried over a private network, and the vector DB never exposes data outside the Trust Domain.

Why This Matters

This architecture solves the specific anxiety holding back enterprise AI adoption in regulated industries. You get:

The performance of Gen 4 Bare Metal: Dedicated CPUs, no noisy neighbors, predictable latency
The isolation of Intel TDX: CPU-level memory encryption that’s mathematically verifiable
The network privacy of a dedicated VPC: Unmetered private traffic between your components
The cost predictability: No surprise egress bills from chatty AI workloads

It’s a setup that satisfies the CISO, the General Counsel, and the AI engineering team all at once.

How this compares to alternatives:

AWS Nitro Enclaves: Limited to specific instance types, still running on shared hardware with potential side-channel concerns
Azure Confidential Computing: VMs with SGX or SEV-SNP, but you’re still in a multi-tenant environment with egress charges
OpenMetal TDX: Full bare metal performance, complete hardware isolation, transparent networking, predictable costs

Ready to Secure Your AI Workload?

If you want to verify the TDX capabilities or benchmark a vector database on our Gen 4 hardware, let’s get you access.

Schedule a 30-minute technical walkthrough where we’ll show you TDX attestation in action, discuss your specific compliance requirements (HIPAA, SOC 2, GDPR, PCI DSS), and design a proof of concept deployment tailored to your use case.

Contact our team to get started, or explore our V4 server specifications to see which configuration fits your data volume and query patterns.

Chat With Our Team

We’re available to answer questions and provide information.

Reach Out

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

The Hardware Root of Trust: Intel TDX

The Architecture Setup

Step 1: Hardware Selection

Step 2: Trust Domain Configuration

Step 3: Network Isolation

The Economic Argument for Private RAG

Reference Architecture: What This Looks Like in Production

Why This Matters

Ready to Secure Your AI Workload?

Chat With Our Team

Schedule a Consultation

Try It Out

Evaluating Intel TDX for Production Workloads in 2026

Secret Network to Silicon: Building a True Confidential Computing Stack with Intel TDX on Bare Metal

Adding Confidential Computing to Existing Infrastructure Without Starting Over

How Mid-Market SaaS Companies Use Intel TDX to Win Enterprise Deals

How to Build a Confidential RAG Pipeline That Guarantees Data Privacy

Benchmarking Intel Xeon Gen 5 Performance for High Density Workloads

From Spectre to Sanctuary: How CPU Vulnerabilities Sparked the Confidential Computing Revolution

Why Grant-Funded Orgs Prefer Fixed-Price Confidential Private Clouds Over Hyperscalers

Confidential Computing as Regulators Tighten Cross-Border Data Transfer Rules

Confidential Cloud Storage with Ceph: Securing Sensitive Data at Scale