It has 3rd gen Ray Tracing Cores, 4th gen Tensor Cores, and was released in October of 2022. The full-height, full-length, double-wide NVIDIA L40S GPU Accelerator PCIe card is also an excellent choice for edge deployments and 24/7 enterprise data center operation. It offers the highest levels of graphics and hardware accelerated ray tracing for simulation, cloud gaming, batch rendering, deep learning training, and a host of other workloads. NVIDIA AI Enterprise is a software layer that can be used to augment the L40S to help to streamline the development of immersive visual content and generative AI. NVIDIA also offers NVIDIA RTX Virtual Workstation (vWS) software for creating secure, yet powerful virtual workstations from the data center or cloud to any user device.
Featuring Ada Lovelace architecture, the NVIDIA L40S GPU has a base frequency of 1110MHz, which can be boosted to 2520 MHz with memory running at up to 2250MHz through a 384-bit memory interface (18Gb/s effective). It offers 5X higher inference performance compared to the previous generation and with 48GB of GDDR6 memory, it’s ideal for image generative AI applications. There are 4 outputs including, 4x DisplayPort 1.4a. This GPU does not support HDMI or Multi-Instance GPU (MIG). Featuring a 384-bit memory bus, peak memory bandwidth is listed at 864GB/s. FP16 (half) is listed at 91.61 TFLOPS (1:1), FP32 (float) at 91.61 and FP64 (double) at 1,431 GFLOPS (1:64).
With a power requirement of up to 300W, the L40S accelerator is passively cooled via the host system. It requires a 16-pin power connector, and a minimum system power supply of up to 700W. The GPU is compatible with a PCIe 4.0 x16 slot. NVLink is not supported.
| GPU Architecture | NVIDIA Ada Lovelace Architecture |
| GPU Memory | 48GB GDDR6 with ECC |
| Memory Bandwidth | 864GB/s |
| Interconnect Interface | PCIe Gen4 x16: 64GB/s bidirectional |
| Ada Lovelace Based CUDA Cores | 18,176 |
| 3rd Gen RT Cores | 142 |
| 4th Gen Tensor Core | 568 |
| RT Core Performance TFLOPS | 209 |
| FP32 TFLOPS | 91.6 |
| TF32 Tensor Core TFLOPS | 183|366 |
| BFLOAT16 Tensor Core TFLOPS | 362.05|733 |
| FP16 Tensor Core | 362.05|733 |
| FP8 Tensor Core | 733|1,466 |
| Peak INT8 Tensor TOPS | 733|1,466 |
| Form Factor | 4.4" (H) x 10.5" (L), dual slot |
| Display Ports | 4x DisplayPort 1.4a |
| Max Power Consumption | 350W |
| Power Connector | 16-pin |
| Thermal | Passive |
| Virtual GPU Support | Yes |
| NVENC|NVDEC | 3x l 3x (includes AV1 encode and decode) |
| Root of Trust | Yes |
| NEBS Ready | Level 3 |
| MIG Support | No |
| NVLink Support | No |