Q: What is FP4 (NVFP4) and why does it matter for AI workloads?
FP4 (NVFP4) is a Blackwell-native 4-bit floating-point format that increases low-precision inference throughput beyond the FP8 ceiling of prior GPU generations.
Lower numerical precision means more operations per second and less memory per parameter, so FP4 can roughly double low-precision inference throughput over FP8 on supported software stacks, while quantization-aware techniques keep accuracy acceptable for many serving workloads. It is most useful for high-throughput LLM inference and other latency- or cost-sensitive serving.
On OpenMetal, native FP4 is available on the NVIDIA RTX Pro 6000 (RP6000), which uses the Blackwell architecture. The Hopper-generation H200 tops out at FP8, so for workloads where FP4 throughput is the deciding factor, the RP6000 is the relevant card. Both run as single-tenant bare metal servers with full root access, so you control the inference stack (NVIDIA NIM, vLLM, or TensorRT-LLM) end to end.
FP4 is a serving and inference optimization; mixed-precision training still uses BF16 and FP8.
Related Answers
- NVIDIA RTX Pro 6000 vs H100: Key Differences
- Is the RTX Pro 6000 Better Than the L40S?
- Attaching RP6000 GPU Nodes to an Existing Deployment
Interesting Articles
Interested in OpenMetal Products?
Schedule a Consultation
Get a deeper assessment and discuss your unique requirements.



































