In this article
- The Hardware Root of Trust: Intel TDX
- The Architecture Setup
- The Economic Argument for Private RAG
- Reference Architecture: What This Looks Like in Production
- Why This Matters
- Ready to Secure Your AI Workload?
If you’re a CTO or Tech Lead in a regulated industry like FinTech or Healthcare, you’re likely under pressure to deploy Generative AI. Your legal team wants to query 10 years of M&A documents using an LLM. Your clinical staff wants to chat with patient records to surface treatment patterns. Your IP team wants to search through proprietary research.
But your compliance team immediately asks: where does that data live when it’s being processed?
There’s a massive blocker stopping you, and it’s a valid one: Trust.
To build a RAG (Retrieval-Augmented Generation) pipeline, you have to take your most sensitive data, chunk it, embed it, and load it into a vector database. If you do this on a standard public cloud, even with encryption at rest, there’s a point of vulnerability. When that data is loaded into memory for querying, it’s technically accessible by the cloud provider’s hypervisor.
For a CISO, that’s a non-starter. You cannot hand the keys to your kingdom to a third-party provider, no matter how good their SLA is.
No need to abandon all hope! You just have to find the right the hardware architecture. By deploying your vector database inside an Intel TDX Trust Domain on OpenMetal, you can cryptographically guarantee that only you have access to that data in memory. Not the hypervisor, and definitely not us.
Here’s how you architect that pipeline using OpenMetal’s Gen 4 (V4) infrastructure.
The Hardware Root of Trust: Intel TDX
The core of this architecture relies on Intel Trust Domain Extensions (TDX), available on our V4 server line with Intel Xeon Scalable processors (4th Gen).
What TDX actually does: It allows you to create a “Trust Domain” (TD), a hardware-isolated virtual machine where the CPU itself enforces memory encryption at the silicon level.
Why hypervisor access is the vulnerability: In a standard virtualization environment, the hypervisor (the software managing the VMs) has full privileges over the guest VMs. It can theoretically dump the memory of any guest. This creates a trust boundary problem. You’re relying on policies and access controls rather than mathematical impossibility.
How TDX solves it: The CPU encrypts the Trust Domain’s memory with a key that only the TD itself can access. If the hypervisor (or a bad actor with physical access to the server) tries to read that memory, all they see is ciphertext. You can cryptographically verify the measurement of your Trust Domain through Remote Attestation before sending any data to it, proving the software stack hasn’t been tampered with.
This creates a “Confidential Computing” environment where data remains encrypted even while being processed, satisfying HIPAA’s encryption requirements, GDPR’s data processing restrictions, and SOC 2 Type II controls for memory protection.
The Architecture Setup
Here’s what this looks like in practice:
Data Flow:
- Source documents → Embedding model (in separate Trust Domain or secure environment)
- Embeddings → Vector database (inside TDX Trust Domain) via private VLAN
- Query → Vector DB retrieval → Context assembly → LLM inference
- Response path traverses only private network until final delivery
To build this confidential RAG pipeline, you need three main components: the compute power, the isolation, and the networking to keep it private.
Step 1: Hardware Selection
Vector databases are memory-hungry. For a production-grade RAG pipeline, start with our XL V4 or XXL V4 servers, which support TDX out of the box.
- Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
- RAM: 1 TB DDR5 ECC (critical for keeping vector indices in memory for low-latency retrieval)
- Storage: 4x 6.4TB Micron 7450 MAX NVMe drives (up to 10 working drives)
- Network: Dual 10 Gbps private links (20 Gbps total)
- Egress: 4 Gbps included
- Processor: Dual Intel Xeon Scalable 6530 (32C/64T at 2.1/4.0GHz)
- RAM: 2 TB DDR5 ECC
- Storage: 6x 6.4TB Micron 7450 MAX NVMe drives (up to 24 working drives)
- Network: Dual 10 Gbps private links (20 Gbps total)
- Egress: 4 Gbps included
Note: Medium V4 and Large V4 can also support TDX but require upgrading to 1 TB RAM.
Vector databases like Weaviate, Qdrant, or Milvus often keep indices in memory for sub-100ms p99 latency. This RAM headroom prevents performance degradation under load, especially when you’re serving concurrent queries across multiple embeddings.
Step 2: Trust Domain Configuration
On OpenMetal, you aren’t fighting “noisy neighbors” for resources since you have dedicated hardware. You’ll configure the BIOS to enable TDX and then spin up your VM as a Trust Domain.
Inside this CPU-level isolated environment, you deploy your vector database of choice (Qdrant, Milvus, or Weaviate). The hardware enforcement means you can prove via Remote Attestation where:
- The software stack matches your expected measurements
- It’s running inside a secure enclave
- No unauthorized code has been injected
This moves the conversation from “we trust the provider not to look” to “the provider physically cannot look”.
Step 3: Network Isolation
Isolation is useless if the data is intercepted on its way to the database. This is where OpenMetal’s networking architecture becomes critical.
We provide dual 10 Gbps private links per server (20 Gbps total). When you set up your pipeline, utilize our OpenStack VPC Private Networking to create a VXLAN overlay that isolates the ingestion traffic completely from the public internet.
How it works:
- Your embedding servers talk to your Trust Domain Vector DB over a private, unmetered VLAN
- The data never traverses a public route
- The database sits in a memory space that’s opaque to the outside world
- All inter-server communication stays on dedicated private links
This architecture satisfies the network segmentation requirements in frameworks like NIST 800-53 and PCI DSS.
The Economic Argument for Private RAG
Beyond security, there’s a cost reality to RAG pipelines—they’re “chatty.” Every query involves sending context back and forth between your inference engine and your database. If you build this on a hyperscaler, the egress fees for moving data between services can destroy your budget.
OpenMetal models this differently:
Private traffic is unmetered between your servers. Your embedding service can talk to your vector DB all day without generating a single egress charge.
Public egress is generous: We include 4 Gbps per server for XL V4 and XXL V4, which aggregates across your cluster.
If you have a 3-server cluster of XL V4s, you’ll have 12 Gbps of egress throughput included before we even start looking at 95th percentile billing. This allows for the kind of bursty traffic AI applications generate, without the billing surprises you get elsewhere.
Compare this to AWS, where you pay $0.09/GB for egress after the first GB. A RAG pipeline serving 1 TB/month in query responses would cost $90 in egress alone before compute, before storage, before data transfer between AZs.
Reference Architecture: What This Looks Like in Production
Deployment topology for a HIPAA-compliant healthcare RAG pipeline:
- Ingestion Layer: Separate V4 server running document preprocessing and embedding generation (also in TDX if documents are PHI)
- Vector Database Layer: XL V4 running Qdrant inside TDX Trust Domain
- Inference Layer: XL V4 running your LLM (Llama, Mistral, or commercial API calls that exit via encrypted tunnel)
- Private Network: All three communicate over VXLAN with no public internet exposure
- Access Layer: Application server with TLS termination and authentication sits at the edge
Result: Patient records are chunked and embedded in a secure environment, stored in encrypted memory that the hypervisor cannot access, queried over a private network, and the vector DB never exposes data outside the Trust Domain.
Why This Matters
This architecture solves the specific anxiety holding back enterprise AI adoption in regulated industries. You get:
- The performance of Gen 4 Bare Metal: Dedicated CPUs, no noisy neighbors, predictable latency
- The isolation of Intel TDX: CPU-level memory encryption that’s mathematically verifiable
- The network privacy of a dedicated VPC: Unmetered private traffic between your components
- The cost predictability: No surprise egress bills from chatty AI workloads
It’s a setup that satisfies the CISO, the General Counsel, and the AI engineering team all at once.
How this compares to alternatives:
- AWS Nitro Enclaves: Limited to specific instance types, still running on shared hardware with potential side-channel concerns
- Azure Confidential Computing: VMs with SGX or SEV-SNP, but you’re still in a multi-tenant environment with egress charges
- OpenMetal TDX: Full bare metal performance, complete hardware isolation, transparent networking, predictable costs
Ready to Secure Your AI Workload?
If you want to verify the TDX capabilities or benchmark a vector database on our Gen 4 hardware, let’s get you access.
Schedule a 30-minute technical walkthrough where we’ll show you TDX attestation in action, discuss your specific compliance requirements (HIPAA, SOC 2, GDPR, PCI DSS), and design a proof of concept deployment tailored to your use case.
Contact our team to get started, or explore our V4 server specifications to see which configuration fits your data volume and query patterns.
Schedule a Consultation
Get a deeper assessment and discuss your unique requirements.
Read More on the OpenMetal Blog


































