Intel TDX Performance Benchmarks on Bare Metal: Optimizing Confidential Blockchain and AI Workloads

Resources » Blog » Intel TDX Performance Benchmarks on Bare Metal: Optimizing Confidential Blockchain and AI Workloads

Intel TDX performance for confidential computing on OpenMetal bare metal servers

When you’re running confidential workloads like blockchain validators or AI training models, performance isn’t just about raw compute power—it’s about maintaining security without sacrificing speed. Intel’s Trust Domain Extensions (TDX) technology offers hardware-level isolation for virtual machines, but the real question platform engineers face is: what’s the actual performance cost, and how do you tune your infrastructure to minimize it?

In this post, we’ll walk you through Intel TDX performance benchmarks on OpenMetal’s v4 bare metal servers, showing you exactly how confidential computing performs across different workload types and what you can do to get the most out of your secure infrastructure.

What Intel TDX Brings to Confidential Computing

Intel Trust Domain Extensions (Intel TDX) is Intel’s newest confidential computing technology. This hardware-based trusted execution environment (TEE) facilitates the deployment of trust domains (TD), which are hardware-isolated virtual machines (VM) designed to protect sensitive data and applications from unauthorized access.

Unlike traditional virtualization where the hypervisor has full visibility into guest memory, TDX isolates entire VMs from the underlying physical hardware and infrastructure, including the operating system, hypervisor, and other VMs running on the same hardware. This creates what Intel calls a Trust Domain (TD)—a completely isolated environment where even the cloud provider can’t access your data.

For blockchain and AI workloads, this isolation is particularly valuable. Validator nodes processing financial transactions, AI models training on sensitive datasets, or RPC endpoints handling user queries all benefit from hardware-level confidentiality guarantees that go beyond what traditional software-based security can provide.

Intel TDX uses architectural elements such as SEAM, a shared bit in Guest Physical Address (GPA), secure Extended Page Table (EPT), physical-address-metadata table, Intel Total Memory Encryption – Multi-Key (Intel TME-MK), and remote attestation¹. The technology became broadly available with Intel’s 5th Generation Xeon processors, which power OpenMetal’s v4 server lineup.

OpenMetal’s TDX-Ready Infrastructure: Built for Performance

OpenMetal’s v4 bare metal servers come in four configurations: Medium v4, Large v4, XL v4, and XXL v4. All use 5th Gen Intel Xeon processors and support Intel TDX, which helps keep virtual machines more secure. To use TDX or SGX on these systems, the memory needs to be installed evenly across all memory channels. You also need at least 1TB of total RAM. These are hardware requirements to make sure there’s enough protected memory available for secure workloads, while still leaving room for regular workloads to run well.

The performance characteristics that matter most for confidential workloads aren’t just about CPU and memory—they’re about the entire system architecture. Each server gets 20 Gbps of private network speed using two 10 Gbps connections. Our entire edge network runs at more than 200 Gbps. Customers get their own VLANs, and we include DDoS protection up to 10 Gbps per IP address. Traffic between servers on your private network is free. We only charge for public internet traffic, and our pricing is easy to understand.

For storage, we use fast NVMe drives by default. If your team needs to use GPUs with TDX, our XXL v4 servers let you connect them directly to your virtual machines using PCIe passthrough. This setup gives you both strong isolation and high speed. Since you have full control over the hardware, you don’t have to wait for someone else to enable anything.

Performance Benchmarks: The Real Impact of TDX

Understanding TDX performance requires looking at three key areas where the technology introduces overhead: TDX transitions, memory encryption, and I/O virtualization.

CPU and Memory-Intensive Workloads

With CPU/memory intensive workloads, we tend to observe up to 5% performance difference with confidential guests according to Intel’s testing on 4th generation Xeon processors². SPECrate 2017 Integer and SPECrate 2017 floating point (FP) experience 3% performance drop, while SPC JBB (a memory latency sensitive workload) performance drops up to 4.5%. Similarly, we observe a throughput drop of 2.77% with HPC workload LINPACK and 3.81% with AI workload TensorFlow BERT².

For blockchain validators running Solana or Ethereum nodes, this 3-5% overhead is typically acceptable given the security benefits. The performance impact comes primarily from two sources:

TDX Transitions: Intel TDX transitions (that is, TD-exit/TD-entry) between the TDX Module and TDs. However, the CPU state of the TD (that is, general purpose registers, control registers, and MSRs) may contain sensitive data, and so, to prevent any TD CPU state data inference, the Intel TDX module saves a TD’s CPU state and scrubs the state when passing control to the VMM².
Memory Encryption: The processor has an AES-XTS encryption engine in the memory subsystem that encrypts and decrypts data to and from DRAM outside the SoC. A read operation can experience additional latencies depending on the DRAM speed while the overhead in write operations is less apparent due to the write back nature of the CPU cache².

I/O-Intensive Workloads: Where Tuning Matters Most

I/O-intensive workloads show more significant performance variations with TDX. For I/O intensive workloads generally, we observe that the workloads with a higher number of I/O transactions and bytes of data transfer rate, experience lower Intel TDX performance. This is expected due to the increased number of Intel TDX transitions and the use of bounce-buffers outside the Intel TDX protected memory space [2].

Intel’s benchmarks show performance impacts ranging from 3.6% for read-heavy Redis workloads to 25% for write-intensive database operations². Redis-memtier workload is network I/O intensive with an observed network bandwidth >25 Gbps in this case².

The key factor is CPU headroom. When your workload has spare CPU capacity, TDX overhead gets absorbed without affecting throughput. When CPUs are already saturated, TDX transitions compete for cycles and performance drops more dramatically.

For blockchain RPC endpoints or AI inference services that handle high-frequency requests, this means your server sizing and network architecture become critical optimization points.

Tuning TDX Performance on OpenMetal Infrastructure

Memory Configuration and NUMA Optimization

Proper memory configuration is crucial for TDX performance. NUMA (non-uniform memory access) locality is ensured by pinning and allocating guest memory, QEMU I/O Threads, vhosts, device interrupts, and the device on the same first socket as the vCPUs of the guest.

On OpenMetal’s dual-socket servers, this means:

Pin your TDX VMs to specific NUMA nodes to avoid cross-socket memory access
Ensure memory channels are balanced across all DIMMs for optimal TDX performance
Enable Transparent Huge Pages on the host for better memory efficiency
Reserve sufficient memory for the TDX module itself (the 1TB minimum requirement ensures this)

Network Optimization for High-Throughput Workloads

OpenMetal’s 20 Gbps private networking gives you the bandwidth headroom needed for I/O-intensive confidential workloads. For blockchain validators or AI training clusters that need to communicate securely between nodes, this translates to:

Use dedicated VLANs for inter-node communication to minimize network noise
Leverage free internal traffic for data synchronization between validator and sentry nodes
Configure network polling instead of interrupt-driven I/O to reduce TDX transitions
Implement connection pooling for applications that make frequent network calls

Storage and GPU Passthrough Configuration

For storage-intensive workloads, the combination of NVMe drives and proper I/O configuration makes a significant difference. To reduce the effect of memory copy overhead and to improve overall I/O performance, one could tune software to reduce the Intel TDX transition rates, for example, by using polling mode I/O threads or by reducing timer ticks.

When using GPU passthrough for AI workloads, OpenMetal’s XXL v4 servers provide direct PCIe access to TDX-protected VMs. This eliminates the virtualization overhead that would otherwise compound TDX’s I/O penalties.

Real-World Scenarios: Blockchain and AI Performance

Solana Validator Nodes

Solana’s high-throughput consensus mechanism puts significant pressure on both CPU and network I/O. Running Solana validators in TDX environments on OpenMetal infrastructure typically shows:

3-4% performance impact for core validation tasks (CPU-bound)
5-8% impact during block propagation (network I/O intensive)
Minimal impact on vote processing when properly tuned

The key optimization is ensuring your validator has sufficient CPU headroom during normal operation to absorb TDX overhead during network-intensive periods.

AI Training and Inference

AI workloads show varied TDX performance depending on their I/O patterns:

Model training: 3-5% overhead for compute-intensive phases, higher during data loading
Inference serving: 5-15% overhead depending on request frequency and model size
Distributed training: Network communication patterns significantly affect performance

Using OpenMetal’s GPU passthrough capabilities, you can minimize the virtualization overhead that would otherwise stack with TDX penalties.

Measuring and Monitoring TDX Performance

Key Metrics to Track

When running TDX workloads, monitor these specific metrics:

TDX Transition Rate: High rates indicate I/O or timer-heavy workloads
Memory Encryption Overhead: Watch for increased memory latency
CPU Utilization Patterns: TDX can shift workload distribution across cores
Network I/O Efficiency: Bounce buffer usage affects throughput

Benchmarking Your Workload

Before moving production workloads to TDX, establish baseline performance measurements:

Run identical workloads in both TDX and non-TDX environments
Test at different load levels to understand where TDX overhead becomes significant
Measure end-to-end latency, not just throughput
Verify security attestation doesn’t impact performance monitoring

Future Performance Improvements

Intel has announced several technologies that will improve TDX performance in future generations. Intel will introduce Intel Trust Doman Extensions Connect (Intel TDX Connect) in future Intel CPU generations which enables trusted devices to access TDX-protected memory directly, thereby improving TDX performance related to I/O. With Intel TDX Connect, both performance overheads due to memory copying between shared and private memory and, TDX transitions are minimized.

This means the I/O performance penalties we see today with first-generation TDX will be significantly reduced in future processor generations.

Making the Decision: When TDX Makes Sense

TDX isn’t the right choice for every workload, but for confidential computing scenarios, the security benefits often outweigh the performance costs:

Strong TDX candidates:

Blockchain validators handling high-value transactions
AI training on sensitive or proprietary datasets
Financial services applications requiring regulatory compliance
Multi-tenant infrastructure where tenant isolation is critical

Consider alternatives for:

Latency-critical applications where 5-15% overhead is unacceptable
Workloads that are already CPU-constrained
Applications that require frequent I/O with small datasets

Getting Started with TDX on OpenMetal

If you’re thinking about using TDX, our engineers can help you get started. We’ll walk you through how to set it up, how to tune it for performance, and how to use it for things like AI training or privacy-focused blockchain apps. We’ve worked with teams running these types of workloads and know how to avoid common issues.

If you’re used to public cloud providers that limit access to hardware or make it hard to set up secure environments, OpenMetal works differently. You get full control, your own dedicated servers, and help from real engineers who have deployed these systems in production.

The combination of Intel TDX technology with OpenMetal’s high-performance bare metal infrastructure and 20 Gbps networking gives you a confidential computing platform that doesn’t force you to choose between security and performance.

Contact our team to discuss your specific TDX requirements and get started with performance testing on our v4 infrastructure.

[1] Intel Corporation. “Intel® Trust Domain Extensions (Intel® TDX).”

[2] Intel Corporation. “Performance Considerations: Intel® Trust Domain Extensions.”