If you are using CUDA to perform any sort of non-graphics floating-point computation, be aware of the FMAD (floating-point multiply-add) instruction. Since CUDA hardware needs to straddle not only the world of computation, but also graphics and gaming, it has lots of FMAD units. So, by default the CUDA compiler will try to replace as much of your floating-point computation code with FMAD instructions.
This is fine if you do not rely on the precision of your results. However, this can lead to hard-to-find bugs if you do rely on the precision. If you need the CUDA computation to mimic the floating-point computation on the CPU, then you are better off without the FMAD instructions.
The CUDA compiler (
nvcc) is configured to produce FMAD instructions by default. To request it to stop producing FMAD instructions and use the normal floating-point instructions use the compiler directive
Note that turning off FMAD can hurt performance quite a bit. I found that the time spent on my computations increased by about 20% with FMAD turned off.
Tried with: CUDA 4.1