With AI essentially in a completely bonkers time, it is great to see Nvidia also adapting quickly to both AI and improvements needed in data center GPUs for non-AI work.  Think graphics and video.  I attribute this to Intel’s push with their two data center grade GPUs and, in particular, Intel does not force you into a license agreement to virtualize their GPUs.  This has been a top complaint of mine and that view is shared by many (most?) technical people that are in the virtualization business.  It just makes it harder for us to use Nvidia.  Having a viable competitor in Intel, finally, is making a real difference in the market.

Nvidia Ada Lovelace AD102, AD103, AD104

With the L4 and L40S Nvidia has integrated both AI silicon and Graphic silicon.  As mentioned above, Intel introduced a really solid set of graphics GPUs.  These came out in early 2023.  It appears Nvidia had to respond to that by making their GPUs more useful for multiple workloads.  The new Ada Lovelace is a great step forward as they are effective for both AI inferencing workloads and graphics workloads.  1 purchase, multiple use cases! 

I haven’t had much time yet to fill this completely in, learning about the new AD102 and its siblings.  Partly it is decoding the messy Nvidia docs and the fact they seemed to be made to discourage comparisons…  Or maybe that is me.  I won’t be including A16, A30, and A40 below for now, might add them later.

Special thanks to Tim Dettmers for his GPU Guide and I pulled background from the Nvidia GPU Spec Details here.

Nvidia Data Center GPU Comparison Table

SpecificationsA2A10A16A100RTX A5000RTX ADA 6000L40SL40L4
Architecture      AD102AD102AD104
GPU Memory (GB)      48GB48GB24GB
Memory Bandwidth      864GB/s864GB/s300GB/s
L2 Cache      98,304KB98,304KB49,152KB
FP64      1.431.41?
 FP32      91.690.530.3
TF32      18390.560
 TF16      362181121
 FP8      733362242
Int8      733362242
Int4      733724484
Sparsity Aware      Yes, 2XYes, 2XYes, 2X
RT Core Performance      209.3209.373.1
RT Cores      142 (3rd Gen)142 (3rd Gen)58 (3rd Gen)
CUDA Cores      18176181767424
Tensor Cores      568568232
NVENC | NVDEC      3 | 33 | 32 | 4
JPEG Decoders      444
TDP (W)      350W300W72W
Cost$$$$$$$$$$ $$$$$$$$$$$$$

Note: The AD102 GPU also includes 288 FP64 Cores (2 per SM) which are not depicted in the above diagram. The FP64 TFLOP rate is 1/64th the TFLOP rate of FP32 operations. The small number of FP64 Cores are included to ensure any programs with FP64 code operate correctly, including FP64 Tensor Core code.

OpenMetal Roadmap

We are excited to be shortly offering a new fleet of AI, graphics, and VDI tuned clusters!  I will fill this in more during April as our formal roadmap goes to customers first.

More on the OpenMetal Blog…

Leveraging On-Demand Private Clouds: A Guide for CTOs

Explore this comprehensive guide on how Chief Technology Officers (CTOs) and technical executives can harness the power of on-demand private clouds for their organizations. In  ….Read More

Maximizing Performance and Control with Bare Metal Servers

Maximizing Performance and Control with Bare Metal Servers

Picture having a dedicated physical server exclusively at your disposal, brimming with processing power, memory, and storage. No resource-sharing, no virtualization layers – just pure performance…Read More

AWS vs GCP: Choosing the Right Cloud Platform

AWS and GCP are leading players in cloud computing, offering a wide range of services and attractive pricing. However, choosing the right platform requires understanding their strengths  … Read More