-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- NVIDIA CUDA Linux Release Notes Version 2.3 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- On some Linux releases, due to a GRUB bug in the handling of upper memory and a default vmalloc too small on 32-bit systems, it may be necessary to pass this information to the bootloader: vmalloc=256MB, uppermem=524288 Example of grub conf: title Red Hat Desktop (2.6.9-42.ELsmp) root (hd0,0) uppermem 524288 kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/1 rhgb quiet vmalloc=256MB pci=nommconf initrd /initrd-2.6.9-42.ELsmp.img -------------------------------------------------------------------------------- New Features -------------------------------------------------------------------------------- Hardware Support o See http://www.nvidia.com/object/cuda_learn_products.html Platform Support o Continued OS support - RHEL 4.x, 5.x - Fedora 10 - SLED 10 SP2 - Ubuntu 8.10 o Additional OS support - Ubuntu 9.04 - SUSE Linux 11.1 o Eliminated OS support - Fedora 9 - Ubuntu 8.04 - OpenSUSE Linux 11.0 CUFFT Features o Performance enhancements o Double precision - CUFFT now supports double-precision transforms, with types and functions analagous to the existing single-precision versions. Similarly, the "cufftType" enumeration (used in calls like cufftPlan1d) has expanded to include double-precision identifiers: Precision: Single Double Type: cufftReal cufftDoubleReal Type: cufftComplex cufftDoubleComplex cufftType: CUFFT_R2C CUFFT_D2Z cufftType: CUFFT_C2R CUFFT_Z2D cufftType: CUFFT_C2C CUFFT_Z2Z Function: cufftExecC2C cufftExecZ2Z Function: cufftExecR2C cufftExecD2Z Function: cufftExecC2R cufftExecZ2D - The double-precision versions are invoked in an identical manner to the single-precision ones, obviously with arguments changed from the single- to the double-precision types. See "cufft.h" for exact definitions of the above. CUDA-GDB Features o Available now on all supported Linux platforms o Included in the toolkit installer Cross-Compilation Support o Support compilation of 32bit applications on 64bit hosts. Double Handling by the Compiler o when a ptx file with an sm version prior to sm_13 contains double precision instructions, ptxas now emits a warning that double precision instructions are demoted to single precision. ptxas has a new option --suppress-double-demote-warning to suppress this warning -------------------------------------------------------------------------------- Major Bug Fixes -------------------------------------------------------------------------------- C++ Support for Device Emulation o Support is restored for using C++ code in device emulation mode -------------------------------------------------------------------------------- Known Issues -------------------------------------------------------------------------------- o GPU enumeration order on multi-GPU systems is non-deterministic and may change with this or future releases. Users should make sure to enumerate all CUDA-capable GPUs in the system and select the most appropriate one(s) to use. o Individual GPU program launches are limited to a run time of less than 5 seconds on a GPU with a display attached. Exceeding this time limit causes a launch failure reported through the CUDA driver or the CUDA runtime. GPUs without a display attached are not subject to the 5 second run time restriction. For this reason it is recommended that CUDA is run on a GPU that is NOT attached to an X display. o In order to run CUDA applications, the CUDA module must be loaded and the entries in /dev created. This may be achieved by initializing X Windows, or by creating a script to load the kernel module and create the entries. An example script (to be run at boot time): #!/bin/bash modprobe nvidia if [ "$?" -eq 0 ]; then # Count the number of NVIDIA controllers found. N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l` NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i; done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi o When compiling with GCC, special care must be taken for structs that contain 64-bit integers. This is because GCC aligns long longs to a 4 byte boundary by default, while NVCC aligns long longs to an 8 byte boundary by default. Thus, when using GCC to compile a file that has a struct/union, users must give the -malign-double option to GCC. When using NVCC, this option is automatically passed to GCC. o It is a known issue that cudaThreadExit() may not be called implicitly on host thread exit. Due to this, developers are recommended to explicitly call cudaThreadExit() while the issue is being resolved. o For maximum performance when using multiple byte sizes to access the same data, coalesce adjacent loads and stores when possible rather than using a union or individual byte accesses. Accessing the data via a union may result in the compiler reserving extra memory for the object, and accessing the data as individual bytes may result in non-coalesced accesses. This will be improved in a future compiler release. o OpenGL interoperability - OpenGL cannot access a buffer that is currently *mapped*. If the buffer is registered but not mapped, OpenGL can do any requested operations on the buffer. - Deleting a buffer while it is mapped for CUDA results in undefined behavior. - Attempting to map or unmap while a different context is bound than was current during the buffer register operation will generally result in a program error and should thus be avoided. - Interoperability will use a software path on SLI - Interoperability will use a software path if monitors are attached to multiple GPUs and a single desktop spans more than one GPU (i.e. X11 Xinerama). o Sending sigkill (ctrl-c) to an application that is currently running a kernel on the GPU may not result in a clean shutdown of the process as the kernel may continue running for a long time afterwards on the GPU. In such cases, a system restart may be necessary before running further CUDA or graphics applications. -------------------------------------------------------------------------------- Open64 Sources -------------------------------------------------------------------------------- The Open64 source files are controlled under terms of the GPL license. Current and previously released versions are located via anonymous ftp at download.nvidia.com in the CUDAOpen64 directory. -------------------------------------------------------------------------------- Revision History -------------------------------------------------------------------------------- 07/2009 - Version 2.3 06/2009 - Version 2.3 Beta 05/2009 - Version 2.2 03/2009 - Version 2.2 Beta 11/2008 - Version 2.1 Beta 06/2008 - Version 2.0 11/2007 - Version 1.1 06/2007 - Version 1.0 06/2007 - Version 0.9 02/2007 - Version 0.8 - Initial public Beta -------------------------------------------------------------------------------- More Information -------------------------------------------------------------------------------- For more information and help with CUDA, please visit http://www.nvidia.com/cuda