In this article
With AI essentially in a completely bonkers time, it is great to see Nvidia also adapting quickly to both AI and improvements needed in data center GPUs for non-AI work. Think graphics and video. I attribute this to Intel’s push with their two data center grade GPUs and, in particular, Intel does not force you into a license agreement to virtualize their GPUs. This has been a top complaint of mine and that view is shared by many (most?) technical people that are in the virtualization business. It just makes it harder for us to use Nvidia. Having a viable competitor in Intel, finally, is making a real difference in the market.
Nvidia Data Center GPU Comparison Table
Specifications | A10 | A16 | A100 | RTX A5000 | RTX ADA 6000 | L40S | L40 | L4 |
---|---|---|---|---|---|---|---|---|
Architecture | GA100 | AD102 | AD102 | AD104 | ||||
GPU Memory (GB) | 40GB | 48GB | 48GB | 24GB | ||||
Memory Bandwidth | 1,555GB/s | 864GB/s | 864GB/s | 300GB/s | ||||
L2 Cache | ? KB | 98,304KB | 98,304KB | 49,152KB | ||||
FP64 | 9.7 | 1.43 | 1.41 | ? | ||||
TF64 | 19.5 | ? | ? | ? | ||||
FP32 | 19.5 | 91.6 | 90.5 | 30.3 | ||||
TF32 | 156 | 183 | 90.5 | 60 | ||||
TF16 | 312 | 362 | 181 | 121 | ||||
FP8 | ? | 733 | 362 | 242 | ||||
Int8 | 624 | 733 | 362 | 242 | ||||
Int4 | 1,248 | 733 | 724 | 484 | ||||
Sparsity Aware | Yes, 2X | Yes, 2X | Yes, 2X | Yes, 2X | ||||
RT Core Performance | 0 | 209.3 | 209.3 | 73.1 | ||||
RT Cores | 0 | 142 (3rd Gen) | 142 (3rd Gen) | 58 (3rd Gen) | ||||
CUDA Cores | 6,912 | 18176 | 18176 | 7424 | ||||
Tensor Cores | 432 | 568 | 568 | 232 | ||||
NVENC | NVDEC | 3 | 3 | 3 | 3 | 2 | 4 | |||||
JPEG Decoders | 4 | 4 | 4 | |||||
TDP (W) | 250W | 350W | 300W | 72W | ||||
Cost | $$ | $$ | $$$$ | $$$$ | $$$$ | $$$ | $$ |
Note: The AD102 GPU also includes 288 FP64 Cores (2 per SM) which are not depicted in the above diagram. The FP64 TFLOP rate is 1/64th the TFLOP rate of FP32 operations. The small number of FP64 Cores are included to ensure any programs with FP64 code operate correctly, including FP64 Tensor Core code.
Nvidia Ada Lovelace AD102, AD103, AD104
With the L4 and L40S Nvidia has integrated both AI silicon and Graphic silicon. As mentioned above, Intel introduced a really solid set of graphics GPUs. These came out in early 2023. It appears Nvidia had to respond to that by making their GPUs more useful for multiple workloads. The new Ada Lovelace is a great step forward as they are effective for both AI inferencing workloads and graphics workloads. 1 purchase, multiple use cases!
I haven’t had much time yet to fill this completely in, learning about the new AD102 and its siblings. Partly it is decoding the messy Nvidia docs and the fact they seemed to be made to discourage comparisons… Or maybe that is me. I won’t be including A16, A30, and A40 below for now, might add them later.
Special thanks to Tim Dettmers for his GPU Guide and I pulled background from the Nvidia GPU Spec Details here.
OpenMetal Roadmap
We are excited to be shortly offering a new fleet of AI, graphics, and VDI tuned clusters! I will fill this in more during April as our formal roadmap goes to customers first.
More on the OpenMetal Blog…
Leveraging On-Demand Private Clouds: A Guide for CTOs
Explore this comprehensive guide on how Chief Technology Officers (CTOs) and technical executives can harness the power of on-demand private clouds for their organizations. In ….Read More
Maximizing Performance and Control with Bare Metal Servers
Picture having a dedicated physical server exclusively at your disposal, brimming with processing power, memory, and storage. No resource-sharing, no virtualization layers – just pure performance…Read More
AWS vs GCP: Choosing the Right Cloud Platform
AWS and GCP are leading players in cloud computing, offering a wide range of services and attractive pricing. However, choosing the right platform requires understanding their strengths … Read More