Running Confidential AI Inference on OpenMetal Bare Metal

OpenMetal Answers » Security and Compliance » Running Confidential AI Inference on OpenMetal Bare Metal

Q: Can I run confidential AI inference on OpenMetal bare metal?

Confidential AI inference — where model weights, inputs, and outputs are encrypted in hardware memory during execution — runs on OpenMetal XL v4 bare metal servers using Intel TDX Trust Domains combined with Intel AMX matrix acceleration, with no GPU required for quantized model workloads.

Explore private AI infrastructure

The XL v4’s Intel Xeon Gold 6530 provides AMX (Advanced Matrix Extensions) for BF16 and INT8 matrix operations — the core arithmetic for quantized LLM inference. TDX wraps the inference workload in a hardware-encrypted Trust Domain: the model weights loaded into RAM, the user’s input tokens, and the generated output are all encrypted with a key held only by the CPU and the guest OS. The host operating system and OpenMetal operators cannot observe the inference state. TDX and AMX operate concurrently with no throughput penalty on the matrix acceleration hardware.

The 1TB DDR5 RAM pool accommodates large quantized models entirely in memory — INT8-quantized models in the 30B–70B parameter range fit within the memory budget without weight swapping. For retrieval-augmented generation pipelines, the four Micron 7500 MAX NVMe drives serve the document store at 7,000 MB/s sequential read, keeping retrieval latency low. Intel QAT handles TLS termination for inference API endpoints in hardware, so encryption overhead does not consume AMX compute budget.

Security diagram showing Intel TDX confidential AI inference with AMX acceleration on OpenMetal bare metal.

This architecture is relevant for organizations processing private documents — contracts, medical records, financial statements — where the AI provider must demonstrate that inference inputs are not accessible to the infrastructure operator. On OpenMetal bare metal, there is no co-tenant sharing the physical server, which eliminates the cross-tenant risk present in shared cloud instances. TDX adds the operator isolation layer on top of that single-tenant guarantee. For GPU-accelerated confidential training at scale, contact OpenMetal about GPU server configurations.

Some Recommended Configurations from our Catalog

XL v4

CPU: 2x Intel Xeon Gold 6530
RAM: 1024 GB DDR5
Storage: 25.6 TB NVMe SSD
Bandwidth: 6 Gbps
Monthly Price: Contact for pricing

View Pricing

Interesting Articles

“More customers mean more servers. We are also excited to expand our offering into the US and potentially other regions as OpenMetal grows.”

Stakater Team, , Stakater

Interested in OpenMetal Products?

Contact Us

We’re available to answer questions and provide information.

Reach Out

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options