How to block Windows Update from updating drivers

Controlling what is updated in Windows Update has gotten harder in Windows 10. Drivers for your hardware are automatically installed when you update Windows now. In earlier Windows, you got a chance to cherry pick the updates and mark some updates to never be done. Those choices are gone now! (You can uninstall an update. That is not a great solution when your driver update has made it impossible for your Windows to boot!)

Automatic driver update is a good feature for almost everyone, except those who have NVIDIA graphics cards. Graphics drivers are notorious for being buggy. You almost never want to get the driver from Windows Update for your NVIDIA graphics card. You should always get it from the NVIDIA website itself.

There is a solution to this: block Windows 10 from installing drivers when it does Windows Update.

To do this, open Control Panel and go to System -> Advanced System Settings -> Hardware -> Device Installation Settings. Here choose to never update drivers through Windows Update.

How to monitor GPU

20150629_gpu_monitor

Monitoring the GPU and GPU memory (VRAM) utilization on Windows is easy. There are many general tools, including specific tools supplied by your GPU vendor (like MSI’s Afterburner or EVGA’s Precision). Sadly, there is not a single tool on Linux that can monitor your GPU and show as many stats in a convenient GUI.

The best solution I have found is to open NVIDIA X Server Settings. In the section about your GPU, you can monitor these values:

  • GPU utilization
  • Memory utilization
  • Temperature

Sadly, there is no graphing tools to view these values over time.

Tried with: NVIDIA driver 346.46 and Ubuntu 14.04

Trackpad cursor freezes on Ubuntu

Problem

When I move the cursor using the trackpad of my Aftershock notebook, it sometimes freezes in place. Keyboard still works. The problem happens only with trackpad, not with mouse. The only way to unfreeze the cursor is to switch to virtual terminal Ctrl + Alt + F1 and back to X Ctrl + Alt + F7.

Solution

Apparently, this is a common bug in Ubuntu as reported here. It happens only with NVIDIA drivers with NVIDIA graphics card. A fix has been created for the X Server here.

Until I move to a newer version of Ubuntu with this updated version of X Server, I will have to survive by switch to virtual terminal 🙂

Tried with: NVIDIA 765M and Ubuntu 14.04

NVIDIA module taints Linux kernel

Problem

I installed CUDA 7.0 on Ubuntu running on an Aftershock notebook with NVIDIA graphics card. The NVIDIA graphics drivers were upgraded to version 346. To my pleasant surprise, the graphics card was now directly visible to the Linux kernel. There was no longer any need to use Bumblebee.

However, I started noticing that this Ubuntu would not always boot into Unity. On many cold starts, I saw that Ubuntu would show this error:

After displaying this it would get stuck at the Ubuntu bootup screen.

I also noticed that I could boot up if I first booted into another Ubuntu instance I had on this notebook and later restarted and booted into the current Ubuntu instance.

Solution

Update: I no longer have this problem after installing CUDA 7.5 and the NVIDIA 352 drivers that come along with it on a fresh Ubuntu 15.04 system. I still see the syslog errors, but they no longer stop Ubuntu from booting successfully and the GPU/CUDA can be used without problems. Yay! 😄

Old stuff:

To analyse this problem I cropped out the relevant portions of /var/log/syslog for the case when Ubuntu booted correctly and when it threw the above kernel panic error. These syslog entries can be seen here.

What I found was that there was some kind of a race condition at boot time. If the nvidia-drm module registered early enough with the kernel, then everything was fine. Otherwise, the kernel would complain that the NVIDIA module was tainting it and then it would throw up the above error.

The problem seems to lie in the Read-Copy-Update mechanism of the kernel. Here, some optimizations seem to have been added in recent versions to improve energy efficiency. RCU wakes up the CPUs only after a period of RCU_IDLE_GP_DELAY jiffies, as explained here. This is set to 4 by default, as seen here.

The solution going around the web for this problem is to decrease this sleep time to 1 jiffy, so that the race condition can be ameliorated. Thankfully, we do not need to edit Linux kernel code and recompile to do this! A syslog entry rcu_idle_gp_delay was added for runtime manipulation, as explained here. If we set this to 1, then the chance of this error reduces a lot.

To do this, add the following line to /etc/default/grub:

GRUB_CMDLINE_LINUX="rcutree.rcu_idle_gp_delay=1"

And run update-grub after this. Hopefully, this should fix the race condition so that every boot is successful.

Related links:

Tried with: NVIDIA GTX 765M, Linux 3.13.0-44-generic and Ubuntu 14.04

NVIDIA 343 driver and GTS 250

Problem

On a computer with a NVIDIA GeForce GTS 250 graphics card running Ubuntu 14.04, I decided to update the installed driver from nvidia-331 to nvidia-343. I did this:

$ sudo apt-get install nvidia-343

It installed the driver and then compiled and installed the kernel module. I rebooted the computer. I was able to login in the GUI, but got an empty desktop!

Solution

What kind of graphics driver problem allows graphics, but only hides the desktop elements in Unity? No idea. I dug through /var/syslog, searched for nvidia and found this:

NVRM: The NVIDIA GeForce GTS 250 GPU installed in this system is
NVRM:  supported through the NVIDIA 340.xx Legacy drivers. Please
NVRM:  visit http://www.nvidia.com/object/unix.html for more
NVRM:  information.  The 343.22 NVIDIA driver will ignore
NVRM:  this GPU.  Continuing probe...
NVRM: ignoring the legacy GPU 0000:03:00.0

So, it turns out that the GTS 250 is no longer supported in the 343 version drivers and later. I removed this driver and reinstalled back 331 driver and got back my desktop.

Inconsistency error with libGL.so

Problem

I compiled and linked an OpenGL program with -lGL. On executing the program, I got this runtime error:

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != ((void *)0)' failed!

Solution

The cryptic error message held some clues. I guessed there was some inconsistency between the various shared library files being linked into the program.

Searching for the OpenGL library gave multiple results:

$ locate libGL.so
/usr/lib/nvidia-331/libGL.so
/usr/lib/nvidia-331/libGL.so.1
/usr/lib/nvidia-331/libGL.so.331.38
/usr/lib/x86_64-linux-gnu/libGL.so
/usr/lib/x86_64-linux-gnu/mesa/libGL.so
/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
/usr/lib32/nvidia-331/libGL.so
/usr/lib32/nvidia-331/libGL.so.1
/usr/lib32/nvidia-331/libGL.so.331.38

There seemed to be two versions of the library file: one from MESA located in the standard library directory and another from NVIDIA located in its own directory. I guessed that the MESA file must be getting linked and at runtime it could not work with the NVIDIA display driver. A bit of Googling ended up in this bug report which painted a similar picture.

This bug was not yet fixed for my Ubuntu system. So, explicitly using the NVIDIA library file for linking, using -L/usr/lib/nvidia-331 acted as a workaround for this error. The program compiled and executed without any errors.

Tried with: NVIDIA drivers 331, Ubuntu 14.04 and NVIDIA GeForce 9600 GT

How to get GPU information

To get information about your NVIDIA GPU at the shell, try:

$ nvidia-smi

+------------------------------------------------------+                       
| NVIDIA-SMI 331.38     Driver Version: 331.38         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 9600 GT     Off  | 0000:03:00.0     N/A |                  N/A |
|  0%   58C  N/A     N/A /  N/A |    386MiB /   511MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

This requires that you have the NVIDIA driver package installed.

If you do not have the driver installed, you can try:

$ lspci | grep VGA

03:00.0 VGA compatible controller: NVIDIA Corporation G94 [GeForce 9600 GT] (rev a1)

This lists the actual GPU being used in the graphics card. For example, G94 in the above graphics card.

To get detailed information about the card, use the domain ID number shown at the beginning of the line. For example:

$ lspci -v -s 03:00.0

03:00.0 VGA compatible controller: NVIDIA Corporation G94 [GeForce 9600 GT] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Device 827c
    Flags: bus master, fast devsel, latency 0, IRQ 63
    Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at f4000000 (64-bit, non-prefetchable) [size=32M]
    I/O ports at cf00 [size=128]
    [virtual] Expansion ROM at f7000000 [disabled] [size=512K]
    Capabilities: <access denied>
    Kernel driver in use: nvidia

Notice that the GPU capabilities section is not displayed since that requires superuser privileges. To view the GPU capabilities:

$ sudo lspci -v -s 03:00.0

03:00.0 VGA compatible controller: NVIDIA Corporation G94 [GeForce 9600 GT] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Device 827c
    Flags: bus master, fast devsel, latency 0, IRQ 63
    Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at f4000000 (64-bit, non-prefetchable) [size=32M]
    I/O ports at cf00 [size=128]
    [virtual] Expansion ROM at f7000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Endpoint, MSI 00
    Capabilities: [b4] Vendor Specific Information: Len=14 <?>
    Capabilities: [100] Virtual Channel
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Kernel driver in use: nvidia

Tried with: Ubuntu 14.04

How to install CUDA from Ubuntu repositories

The Ubuntu repositories now host CUDA packages. These may not be latest and greatest, but I find them to work well with the NVIDIA drivers from the same place. Also, they are very easy to install compared to the packages from the NVIDIA website.

  • First, install the latest NVIDIA driver available from the repositories. NVIDIA driver package is named as nvidia-xyz, where xyz is the version. Pick the largest number version that is available from the repositories. For example:
$ sudo apt install nvidia-331

The installation process compiles the driver for your particular Linux kernel and deploys that module. Restart the computer once the install is done.

You must be able to see a NVIDIA module when you list the kernel modules. For example, on my computer:

$ dkms status
nvidia-331, 331.38, 3.13.0-24-generic, x86_64: installed
  • Now you are ready to install CUDA. This is really easy since installing the package nvidia-cuda-toolkit will pull in all the hundred other CUDA packages and tools that are needed:
$ sudo apt install nvidia-cuda-toolkit

That is it, enjoy your CUDA programming! 🙂

Tried with: Linux kernel 3.13.0-24-generic, CUDA 5.5, NVIDIA driver 331, NVIDIA GTX Titan and Ubuntu 14.04

How to use discrete graphics on NVIDIA Optimus notebook using Bumblebee

NVIDIA Optimus is a technology that is used on notebooks that have a CPU with integrated graphics hardware in addition to a discrete NVIDIA mobile graphics hardware. It enables the notebook user to get maximum battery life while not compromising on graphics performance. It intelligently switches between the CPU graphics and discrete NVIDIA graphics hardware so that performance is not compromised but battery life is conserved. Notebooks which have this technology have a NVIDIA Optimus sticker on them. Sadly, Optimus works only on Windows. In Ubuntu, the discrete NVIDIA graphics card is not even detected.

Bumblebee is a solution to use the discrete graphics hardware on Optimus notebooks. It is not intelligent or seamless like Optimus on Windows. Rather, it sets up the Linux system such that the NVIDIA drivers for the discrete hardware are installed and setup. Bumblebee enables you to run a specific application fully on the discrete graphics hardware.

These steps can be used to install Bumblebee on Ubuntu 14.04:

  • Install Bumblebee and related NVIDIA packages:
$ sudo apt-get install bumblebee bumblebee-nvidia primus nvidia-331

This also compiles and installs the necessary kernel modules to support NVIDIA hardware.

  • Reboot the system.

  • To run a program named foobar using the discrete NVIDIA graphics hardware:

$ optirun foobar

Tried with: NVIDIA GeForce GTX 765M and Ubuntu 14.04