NVIDIA module taints Linux kernel


I installed CUDA 7.0 on Ubuntu running on an Aftershock notebook with NVIDIA graphics card. The NVIDIA graphics drivers were upgraded to version 346. To my pleasant surprise, the graphics card was now directly visible to the Linux kernel. There was no longer any need to use Bumblebee.

However, I started noticing that this Ubuntu would not always boot into Unity. On many cold starts, I saw that Ubuntu would show this error:

After displaying this it would get stuck at the Ubuntu bootup screen.

I also noticed that I could boot up if I first booted into another Ubuntu instance I had on this notebook and later restarted and booted into the current Ubuntu instance.


Update: I no longer have this problem after installing CUDA 7.5 and the NVIDIA 352 drivers that come along with it on a fresh Ubuntu 15.04 system. I still see the syslog errors, but they no longer stop Ubuntu from booting successfully and the GPU/CUDA can be used without problems. Yay! šŸ˜„

Old stuff:

To analyse this problem I cropped out the relevant portions of /var/log/syslog for the case when Ubuntu booted correctly and when it threw the above kernel panic error. These syslog entries can be seen here.

What I found was that there was some kind of a race condition at boot time. If the nvidia-drm module registered early enough with the kernel, then everything was fine. Otherwise, the kernel would complain that the NVIDIA module was tainting it and then it would throw up the above error.

The problem seems to lie in the Read-Copy-Update mechanism of the kernel. Here, some optimizations seem to have been added in recent versions to improve energy efficiency. RCU wakes up the CPUs only after a period of RCU_IDLE_GP_DELAY jiffies, as explained here. This is set to 4 by default, as seen here.

The solution going around the web for this problem is to decrease this sleep time to 1 jiffy, so that the race condition can be ameliorated. Thankfully, we do not need to edit Linux kernel code and recompile to do this! A syslog entry rcu_idle_gp_delay was added for runtime manipulation, as explained here. If we set this to 1, then the chance of this error reduces a lot.

To do this, add the following line to /etc/default/grub:


And run update-grub after this. Hopefully, this should fix the race condition so that every boot is successful.

Related links:

Tried with: NVIDIA GTX 765M, Linux 3.13.0-44-generic and Ubuntu 14.04

7 thoughts on “NVIDIA module taints Linux kernel

  1. We encounter a same problem in our system, but the nvidia-drm module never gets loaded early enough, even after RCU_IDLE_GP_DELAY to 1. Do you have any other suggestion?


    1. Andreas: To be frank, this delay solution is not working for me on another Optimus notebook šŸ˜¦
      If you discover any other solution, please let me know.


    1. VaclavC: I no longer get this problem if I use a “fresh” Ubuntu 14.04.3 or 15.04 system and install CUDA-7/NVIDIA-352.


    2. I had this problem with 352 too. But I don’t have a fresh installation (and I don’t want to reinstall right now ;-).


  2. I have this crash, thank you so much for posting this information! I’ll try the solution you posted!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.