Nvidia CUDA Toolkit

12.8

The CUDA Installers include the CUDA Toolkit, SDK code samples, and developer drivers.

The Nvidia CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize, and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler, and a runtime library to deploy your application.

Using built-in capabilities for distributing computations across multi-GPU configurations, scientists and researchers can develop applications that scale from single GPU workstations to cloud installations with thousands of GPUs.

Features

C/C++ compiler
Visual Profiler
GPU-accelerated BLAS library
GPU-accelerated FFT library
GPU-accelerated Sparse Matrix library
GPU-accelerated RNG library
Additional tools and documentation

Highlights:

Easier Application Porting
- Share GPUs across multiple threads
- Use all GPUs in the system concurrently from a single host thread
- No-copy pinning of system memory, a faster alternative to cudaMallocHost()
- C++ new/delete and support for virtual functions
- Support for inline PTX assembly
- Thrust library of templated performance primitives such as sort, reduce, etc.
- Nvidia Performance Primitives (NPP) library for image/video processing
- Layered Textures for working with same size/format textures at larger sizes and higher performance
Faster Multi-GPU Programming
- Unified Virtual Addressing
- GPUDirect v2.0 support for Peer-to-Peer Communication
New & Improved Developer Tools
- Automated Performance Analysis in Visual Profiler
- C++ debugging in CUDA-GDB for Linux and MacOS
- GPU binary disassembler for Fermi architecture (cuobjdump)
- Parallel Nsight 2.0 now available for Windows developers with new debugging and profiling features.

What's New

CUDA 12 introduces support for the NVIDIA Hopper and Ada Lovelace architectures, Arm server processors, Lazy Module and Kernel Loading, revamped Dynamic Parallelism APIs, enhancements to the CUDA graphs API, performance-optimized libraries, and new developer tool capabilities.
Support for the NVIDIA Hopper architecture includes next generation Tensor Cores and Transformer Engine, hi-speed NVLink Switch system, mixed precision modes, 2nd generation Multi-Instance GPU (MIG), advanced memory management, and standard C++/Fortran/Python parallel language constructs.

Complete release notes can be found here.

Fast servers and clean downloads.
Serving tech enthusiasts for over 25 years.
Tested on TechSpot Labs.

Download options: