What FP4 (NVFP4) Is and Why It Matters

Q: What is FP4 (NVFP4) and why does it matter for AI workloads?

FP4 (NVFP4) is a Blackwell-native 4-bit floating-point format that increases low-precision inference throughput beyond the FP8 ceiling of prior GPU generations.

Explore GPU servers

Lower numerical precision means more operations per second and less memory per parameter, so FP4 can roughly double low-precision inference throughput over FP8 on supported software stacks, while quantization-aware techniques keep accuracy acceptable for many serving workloads. It is most useful for high-throughput LLM inference and other latency- or cost-sensitive serving.

On OpenMetal, native FP4 is available on the NVIDIA RTX PRO 6000, which uses the Blackwell architecture. The Hopper-generation H200 tops out at FP8, so for workloads where FP4 throughput is the deciding factor, the RTX PRO 6000 is the relevant card. Both run as single-tenant bare metal servers with full root access, so you control the inference stack (NVIDIA NIM, vLLM, or TensorRT-LLM) end to end.

FP4 is a serving and inference optimization; mixed-precision training still uses BF16 and FP8.

Interesting Articles

“Public cloud GPU access is riddled with limitations – premium pricing, throttled performance, and infrastructure you don’t truly control. We built our GPU Servers and Clusters to provide a different experience: complete control, transparent pricing, and no compromises on performance or privacy.”

Rafael Ramos, Director of Software Engineering — OpenMetal

Interested in OpenMetal Products?

Contact Us

We’re available to answer questions and provide information.

Reach Out

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options