How to make Intel Wireless-AC 3165 work in Ubuntu


I installed Ubuntu 16.04 on a new notebook that had Intel Wireless-AC 3165 hardware to provide wifi and Bluetooth connectivity. This hardware was working without any problems in Windows 10, without requiring any extra software installation. However, the wireless and Bluetooth features did not work in Ubuntu.


There are many solutions offered on the web to enable wireless connectivity. Copying the ucode files provided by Intel here to /lib/firmware definitely did not work.

I was using Ubuntu 16.04, which was using the Linux 4.4.x kernel. One of the suggestions online is that newer kernels have better support for Intel wireless hardware.

Ubuntu provides newer versions of Linux kernel and X in updated HWE stacks. I installed the latest available HWE stack for Ubuntu 16.04, as described here. This installed a 4.8.x kernel and my wireless and Bluetooth worked like magic without requiring any configuration after a reboot.

I have also tried to manually install a newer Linux kernel. Ubuntu developers provide installers of all newer Linux kernels. So I decided to try my luck with the latest 4.9.x series and installed it as described here. This also enabled me to use this Intel wireless adapter without any problem!

How to update Linux kernel in Ubuntu

Every version of Ubuntu LTS sticks to a particular version of the Linux kernel. For example, Ubuntu 16.04 sticks to the 4.4.x series of kernels. Over the months and years you use and update this version of Ubuntu, only newer minor versions of the kernel are updated to maintain stability.

However, there might be reasons you might want to upgrade to a major new Linux kernel version. For example, to get support for newer hardware and firmware. The first option you can try is to install the latest HWE stack for your LTS release, as described here. That should upgrade your Linux kernel and X to recent versions.

If a HWE stack is not available or the versions it provides is not recent enough for you, then Ubuntu developers maintain a series of the Linux mainline kernel which can be downloaded and installed.

  • Go to and decide which version of kernel you want. For example, I decided to upgrade from 4.4.x to 4.9.x.

  • You need to download 3 deb files for a full kernel installation. These will be named in this format: linux-headers-XXX_amd64.deb, linux-headers-XXX_all.deb and linux-image-XXX-generic_XXX_amd64.deb.

  • Install them:

$ sudo dpkg -i *.deb
  • Restart the computer and check if you are using the new kernel:
$ uname -r
  • Do not delete the kernel provided by Ubuntu even if apt keeps reminding you that you do not need it! If something goes wrong with the new kernel, you might want to keep the older one, so you can boot using that in the GRUB boot screen.

How to remove old kernel packages

Every Ubuntu update installs a new version of the Linux kernel. It keeps the old kernel packages as backup and does not remove them. It does not mark them for autoremove either. Occasionally, I manually remove the old kernel packages keeping the current version and one older version as backup.

Thankfully, I discovered that there is a program that does this exact operation now!

Install the bikeshed package:

$ sudo apt install bikeshed

To remove all old kernel packages except the current one and the next older one:

$ sudo purge-old-kernels

Tried with: Ubuntu 15.10

How to use performance analysis tools of Linux kernel

The performance counters available in modern processors can be easily accessed through the Linux kernel to analyze the performance of your programs. For example, you can find out how many branch misses occurred when your program executed.

  • We first install the performance analysis tools for our specific Linux kernel version:
$ sudo apt install linux-tools-(uname -r) linux-tools-common
  • perf is the main command that we use for analysis.

  • List the performance counters and events that you can measure:

$ perf list

List of pre-defined events (to be used in -e):
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  ref-cycles                                         [Hardware event]

  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]
  context-switches OR cs                             [Software event]
  cpu-migrations OR migrations                       [Software event]
  minor-faults                                       [Software event]
  major-faults                                       [Software event]
  alignment-faults                                   [Software event]
  emulation-faults                                   [Software event]
  dummy                                              [Software event]

  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]

  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  cycles-ct OR cpu/cycles-ct/                        [Kernel PMU event]
  cycles-t OR cpu/cycles-t/                          [Kernel PMU event]
  el-abort OR cpu/el-abort/                          [Kernel PMU event]
  el-capacity OR cpu/el-capacity/                    [Kernel PMU event]
  el-commit OR cpu/el-commit/                        [Kernel PMU event]
  el-conflict OR cpu/el-conflict/                    [Kernel PMU event]
  el-start OR cpu/el-start/                          [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]
  mem-loads OR cpu/mem-loads/                        [Kernel PMU event]
  mem-stores OR cpu/mem-stores/                      [Kernel PMU event]
  tx-abort OR cpu/tx-abort/                          [Kernel PMU event]
  tx-capacity OR cpu/tx-capacity/                    [Kernel PMU event]
  tx-commit OR cpu/tx-commit/                        [Kernel PMU event]
  tx-conflict OR cpu/tx-conflict/                    [Kernel PMU event]
  tx-start OR cpu/tx-start/                          [Kernel PMU event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[:access]                                [Hardware breakpoint]

  [ Tracepoints not available: Permission denied ]

We can see that a lot of counters and events are available for measurement. The tracepoints can be listed if the command is run as sudo.

  • We can query any of the above listed events. For example, to find out the number of branch misses in a program foo by repeatedly running the program 10 times:
$ perf stat -r 10 -e branch-misses ./foo

 Performance counter stats for './foo' (10 runs):

            42,683 branch-misses                                                 ( +-  0.55% )

       0.001782436 seconds time elapsed                                          ( +- 12.91% )

All values are the average of 10 executions, since that is what we specified.

  • To print out other stats, just append with a comma. For example, to get both branch misses and memory stores:
$ perf stat -r 10 -e branch-misses,mem-stores ./foo

 Performance counter stats for './foo' (10 runs):

            42,603 cpu/branch-misses/                                            ( +-  0.23% )
           972,221 cpu/mem-stores/                                               ( +-  0.05% )

       0.001748714 seconds time elapsed                                          ( +-  0.39% )

Tried with: Intel i7-4790, Linux 3.13.0-52-generic and Ubuntu 14.04

NVIDIA module taints Linux kernel


I installed CUDA 7.0 on Ubuntu running on an Aftershock notebook with NVIDIA graphics card. The NVIDIA graphics drivers were upgraded to version 346. To my pleasant surprise, the graphics card was now directly visible to the Linux kernel. There was no longer any need to use Bumblebee.

However, I started noticing that this Ubuntu would not always boot into Unity. On many cold starts, I saw that Ubuntu would show this error:

After displaying this it would get stuck at the Ubuntu bootup screen.

I also noticed that I could boot up if I first booted into another Ubuntu instance I had on this notebook and later restarted and booted into the current Ubuntu instance.


Update: I no longer have this problem after installing CUDA 7.5 and the NVIDIA 352 drivers that come along with it on a fresh Ubuntu 15.04 system. I still see the syslog errors, but they no longer stop Ubuntu from booting successfully and the GPU/CUDA can be used without problems. Yay! 😄

Old stuff:

To analyse this problem I cropped out the relevant portions of /var/log/syslog for the case when Ubuntu booted correctly and when it threw the above kernel panic error. These syslog entries can be seen here.

What I found was that there was some kind of a race condition at boot time. If the nvidia-drm module registered early enough with the kernel, then everything was fine. Otherwise, the kernel would complain that the NVIDIA module was tainting it and then it would throw up the above error.

The problem seems to lie in the Read-Copy-Update mechanism of the kernel. Here, some optimizations seem to have been added in recent versions to improve energy efficiency. RCU wakes up the CPUs only after a period of RCU_IDLE_GP_DELAY jiffies, as explained here. This is set to 4 by default, as seen here.

The solution going around the web for this problem is to decrease this sleep time to 1 jiffy, so that the race condition can be ameliorated. Thankfully, we do not need to edit Linux kernel code and recompile to do this! A syslog entry rcu_idle_gp_delay was added for runtime manipulation, as explained here. If we set this to 1, then the chance of this error reduces a lot.

To do this, add the following line to /etc/default/grub:


And run update-grub after this. Hopefully, this should fix the race condition so that every boot is successful.

Related links:

Tried with: NVIDIA GTX 765M, Linux 3.13.0-44-generic and Ubuntu 14.04

How to remove old Linux kernels


Ubuntu updates Linux kernels almost every month. If you regularly update Ubuntu, you will end up with a lot of old Linux kernels.

A Linux kernel in Ubuntu is installed as four packages. They are listed here for kernel 3.13.0-43: linux-headers-3.13.0-43, linux-headers-3.13.0-43-generic, linux-image-3.13.0-43-generic and linux-image-extra-3.13.0-43-generic.

You can of course look up the latest kernel version using uname -r and then proceed to remove all the rest of the Linux kernel packages manually.

A great alternative is to use the purge-old-kernels tool. This removes all but the latest 2 kernels. Strangely, it is shipped along with the Byobu package, so you will need to install that to use it:

$ sudo apt install byobu
$ sudo purge-old-kernels

Another alternative that I like is to use Ubuntu Tweak tool. It can be installed easily, as described here.

In Ubuntu Tweak, go to Janitor -> System -> Old Kernel and you will be presented with all the Linux kernel packages on your system. The current kernel will not be included here, for obvious reasons. You can now pick and choose and select what you want to remove easily from here.

Tried with: Ubuntu Tweak 0.8.7 and Ubuntu 14.04

How to pass Thrust device vector to CUDA kernel

Thrust makes it convenient to handle data with its device_vector. But, things get messy when the device_vector needs to be passed to your own kernel. Thrust data types are not understood by a CUDA kernel and need to be converted back to its underlying pointer. This is done using thrust::raw_pointer_cast:

#include <thrust/device_vector.h>

thrust::device_vector<int> iVec;

int* iArray = thrust::raw_pointer_cast(&iVec[0]);

fooKernel<<<x, y>>>(iArray);

Inside a kernel, I typically need not only a pointer to the array, but also the length of the array. Thus, I find it is useful to convert device_vector to a structure that holds both a pointer to the array and its length. (Very much like a vector itself.) Creating such a structure and its conversion function is easy thanks to templates:

// Template structure to pass to kernel
template <typename T>
struct KernelArray
    T*  _array;
    int _size;

// Function to convert device_vector to structure
template <typename T>
KernelArray<T> convertToKernel(thrust::device_vector<T>& dVec)
    KernelArray<T> kArray;
    kArray._array = thrust::raw_pointer_cast(&dVec[0]);
    kArray._size  = (int) dVec.size();

    return kArray;

Passing device_vector to kernels and accessing its array inside the kernel is easy thanks to this infrastructure:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(convertToKernel(iVec)); // Explicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
    for (int i = 0; i < inArray._size; ++i)
        something = inArray._array[i];
    // ...

You can take it a notch higher and make the conversion from device_vector to KernelArray to be implicit. This can be done by adding a constructor to KernelArray that takes one input parameter of type device_vector. (See Stephen’s comment below the post.)

With such a constructor, you can now pass a device_vector seamlessly to the kernel:

thrust::device_vector<int> iVec;

fooKernel<<<x, y>>>(iVec); // Implicit conversion from iVec to KernelArray<int>

__global__ fooKernel(KernelArray<int> inArray)
    // ...

Tried with: Thrust 1.3 and CUDA 3.2