Out of memory error with py-faster-rcnn


When training a Faster-RCNN model using py-faster-rcnn I noticed that it would randomly crash with this error:

out of memory
invalid argument
an illegal memory access was encountered
F0919 18:01:51.657281 21310 math_functions.cu:81] Check failed: error == cudaSuccess (77 vs. 0)  an illegal memory access was encountered
*** Check failure stack trace: ***


On closer investigation, I found that a training process would crash when I ran a second or third training on the same GPU. How can running a second training kill the first training? The only scenario I could think of, based on the above error message, was that a cudaMalloc to get more memory was failing. But why was Caffe trying to get more memory in the middle of training. Should not the reshape all be done and finished in the beginning of training?

Anyway, the first problem was to reliably reproduce the error, since it did not always crash on running a second training. It only crashed once in a few times in these scenarios. Since I suspected cudaMalloc, I wrote a small adversary CUDA program that would try to grab as much GPU memory as possible. I ran this program a while after training had started and it reliably crashed the training everytime!

A core dump file was being generated on crash, but it was at first useless since I was running Caffe compiled in release mode with no debugging symbols. I recompiled Caffe in debug mode with debugging symbols and used that to open in GDB:

$ gdb /usr/bin/python core

After the core dump was loaded in GDB, I got its backtrace using bt. It was interesting, but did not point to anything suspicious.

I next monitored GPU memory occupied by the training continuously, using this watch and nvidia-smi:

$ watch -n 0.1 nvidia-smi

I noticed that the GPU memory used by training incremented and decremented by around 18MB consistently all the time. If my adversary CUDA program went and grabbed the 18MB that was released, then the training would crash when it tried to alloc that same memory next time.

So, who is allocating and releasing memory all the time in py-faster-rcnn? Since I had ported the proposal layer recently from Python to C++, I remembered NMS. There is both a CPU and GPU version of NMS in py-faster-rcnn. The GPU NMS is used by default, though this can be changed in config.py. By switching it to CPU, I found that the crash no longer happened.

But the problem is that CPU NMS at 0.2s was 10 times slower than the GPU NMS at 0.02s for my setup.


Once I saw that the GPU NMS code in lib/nms/nms_kernel.cu was doing the cudaMalloc and cudaFree continuously, fixing it was easy. I changed the allocated memory pointers to static and changed the code to hold on to the memory allocated last time. Only if more memory was required then the old one would be freed and new larger one allocated. I basically used the same strategy used by std:vector, that is doubling of memory. A better solution would be to allocate the maximum required memory, based on the box numbers set in config.py, and use it during training.

Tried with: CUDA 7.5 and Ubuntu 14.04

Find Cheatsheet

The find command in Linux is immensely useful. Here are some invocations of this command I find useful:

  • List all files and directories under a specified directory:
$ find /usr/include
$ find ../foobar/lib
$ find .

Unlike commands like ls, note that the find command is recursive by default. It will list everything under the specified directory.

  • List only normal files (symbolic links and directories will not be shown):
$ find /usr/include -type f
  • List only symbolic links:
$ find /usr/foobar -type l
  • List only directories:
$ find /usr/foobar -type d
  • Combining two or more types can be done. For example, to list normal files and symbolic links:
$ find /usr/foobar -type f -or -type l
  • List file or directory paths with a wildcard pattern:
$ find /usr/foobar -name "exactly_match_this"
$ find /usr/foobar -name "*.this_extension"
$ find /usr/foobar -name "*some_substring*"
  • List matching a wildcard pattern, but case insensitive:
$ find /usr/foobar -iname "*someblah*"
  • List files whose size is exactly or greater than a specific size:
$ find /usr/foobar -size 10k
$ find /usr/foobar -size +10k
$ find /usr/foobar -size +10M
$ find /usr/foobar -size +10G
  • List files and directories newer than the specified file or directory:
$ find /usr/foobar -newer stdio.cpp
  • Do a ls -l on specific files and directories:
$ find /usr/foobar -size +20k -ls
  • Delete files and directories that match:
$ find /usr/foobar -size +20k -delete
  • Note that many of the above operations can be performed by piping the results of find to the immensely useful xargs command. For example, to ls on the results of find:
$ find /usr/foobar -size +20k | xargs ls -l


  • man find
  • The Linux Command Line: A Complete Introduction by William Shotts

Tried with: Ubuntu 16.04

Audio format not supported by MX Player


I tried to play a video file using MX Player on my Android device. It played the video, but there was no audio. This error message is shown: This audio format (AC3) is not supported


  • Download codec: The required codec files are available here. You will find the codec files for the different versions of MX Player and for each you see the codec compiled for different chip architectures. I recommend downloading the AIO pack that matches your MX Player. If you download the wrong version MX Player will complain later when you try to use the codec.

  • Load codec: Open MX Player and go to Settings → Decoder → Custom Codec. Navigate to the directory containing the codec zip file and choose it. MX Player will restart after loading the codec. You can now play the video and should be able to hear the audio stream.

Review of Dell S2415H

It has been 6 months since I started using the Dell S2415H display at both home and work. The S in Dell displays is the budget series compared to the premium U series. I have been very happy with what I got for the price and I can no longer imagine working at my tiny notebook display anymore!

  • Size: As you might guess from the model number, this is a 24-inch display. It looks smaller than one though because it has almost zero bezel. Once you use such an edge-to-edge display, you cannot go back to thick bezels. An added advantage of zero bezel is it is great for dual and multi-display setups cause it gives you the feel of a huge display.

  • Resolution: 1920×1080. Not as high as I would like, but fine for my coding and browsing.

  • Inputs: HDMI and VGA. This is where Dell really cut down on the features. No DVI or Displayport! I do not play games and my notebook and graphic cards have HDMI and VGA outputs, so I am fine.

  • Color: This is where IPS really shines through. Everything looks rich and vivid. The color gamut is great and the brightness even at the lowest brightness setting is way high. I actually wish Dell had provided lower brightness levels cause using this display in dark room at night is hard on the eyes.

  • Glossy: This is one of my biggest problems with this display. It is very very glossy. It looks great to visitors and in photos. But a pain while working, especially if you are working with black terminals or themes. Note that the U series display are sensibly matte and do not have this problem.

  • Stand: The stand is dead simple and the only rotational alignment that can be done is along the horizontal axis. Not a problem for me cause I just place it on thick books to increase its height.

  • Price: It is very affordable. I doubt you can get a better deal on an IPS display with such a color gamut, good color calibration and such brightness levels. If you can afford it, you should definitely go for the U series displays, cause they are worth the extra cash.

  • Linux: I use the display with Kubuntu and it was detected correctly in its settings and works fine.

In conclusion, this is a good display from Dell and I am loving it. If you can afford it, I highly recommend the U series, which also I use regularly.

4 million views


I am happy to note that this blog has now crossed 4 million views! It crossed 3 million just over a year ago. Almost all the traffic comes from Google. There are close to 2000 posts and the traffic is spread over a long tail of posts.


When the blog crossed 3 million views, I also renamed it from ChooruCode.com to CodeYarns.com. Though it seemed sensible at that time, in hindsight it turned out to be a bad idea. Google immediately cut traffic in half after the rename and it never recovered from that. Other than that halving of traffic, month-on-month traffic is pretty stable as seen in the above plot.


These days I mostly use my Twitter account @daariga to both consume and share. I have to admit that after Twitter there is very little energy left to write blog posts! Also, family and work are increasingly bigger consumers of my time, not that I am complaining. I next look forward to seeing how long it takes to cross the big 5 million🙂

How to cast to Raspberry Pi using Raspicast

Fret not that your friends can cast Youtube videos from their Android device to their TV using Chromecast! You can cast anything from your Android device to your TV if you have a Raspberry Pi connected to it using the awesome Raspicast app!

  • Setup your Pi: You will need a Raspberry Pi with Raspbian installed on it. It is connected to your TV using a HDMI cable. You can SSH to your Pi. Optionally, you have OMXPlayer installed on your Pi and checked that you can play video files using it.
  • Install app: Install the Raspicast app from here.
  • Configure: Provide the IP address of your Pi, its port (usually 22), login (usually pi) and password.
  • Cast away: Play any video in Youtube app on your Android device. Click the Share option and choose Cast (Raspicast). Raspicast opens, wait for a second and you will see it playing on your TV. If you want to keep adding Youtube videos to a queue then choose Queue (Raspicast) while sharing. You can view the queue in the Raspicast app.
  • Use the Files section of the app to browse and play video files that are on your Pi. The UI for this is pretty basic. You are probably better off using OMX Remote for this.
  • Use the Cast section of the app to browse and play video files that are on your Android device itself.

Tried with: Raspicast 1.3.6

OMX Remote

Video files that are on partitions accessible by Raspbian can be played using OMXPlayer. However, it is quite a hassle to SSH into your Raspberry Pi and play a movie using OMX Player from the shell. The ideal solution would be to browse the video files from the comfort of your Android device and play it from there. And that is exactly what OMX Remote does!

  • HDMI: Connect your Pi to your TV using a HDMI cable.
  • SSH: Make sure you can SSH into your Raspberry Pi from a computer on your home network.
  • OMXPlayer: After SSH into the Pi, make sure you can play videos from the shell using OMXPlayer as described here.
  • Install OMX Remote: It can be installed from the Play Store here.
  • Configure OMX Remote: Provide the hostname (or IP address of your Pi), port (usually 22), username (usually pi) and password. I also like to set the root directory of my media.
  • Play using remote: Browse the video files from the app and click on a video file to play it on your TV. Various remote control UIs of varying complexity are available by swiping left and right. Some of the useful buttons on the remote control is to switch audio channels (speaker with arrow buttons) and to switch subtitles (speech bubble with arrow buttons).

Tried with: OMX Remote 1.9

How to install and use the DLink DCS-930L camera

The DLink DCS 930L is a simple camera for home surveillance. It can be connected to your home router using an Ethernet cable or wirelessly. You can watch and listen live from its VGA camera and microphone on your smartphone and other smart devices. Installing and using it turned out to be very easy.

Here is what worked for me:

  • Prepare the camera: Connect the camera using the provided power cable and power it up. Plug the provided Ethernet cable into the socket at the back and plug the other end to your home router. If the camera is given an IP address by the router and everything is fine, the light at the back of the camera should turn a steady green.

  • Create mydlink login: Create a login on the mydlink website if you do not already have one. You will need this login to add and use your camera.

  • Add camera using Android app: Install the mydlink Lite app on your phone and use your mydlink login. According to DLink documentation, you can add your camera using this mydlink Lite app if the camera and your phone are both connected to the same home network (or router). This did not work for me!

  • Download Windows wizard: Since I could not add the camera using the Android app, I had to power up my old Windows notebook. Connect Windows wirelessly to the same home network as your camera. I visited the DLink Support website to download an installation wizard. It will ask for the hardware revision (A or B). You can find this printed at the back of your camera. Mine turned out to be a revision B. I was able to download the wizard for rev B from here.

  • Add camera using Windows wizard: On running the wizard in Windows, it discovered my camera. I was asked to provide a password for the admin login on the camera. Since I chose to have the camera work wirelessly, the wizard then asked me to unplug the Ethernet cable and helped me pick the home wireless network to connect to. After this comes the mydlink login and once that is done, you can see the live video feed from your camera!

  • Watch video from Android devices: This is a cloud camera whose live feed you can check on your smartphone. Install the mydlink Lite app and provide your mydlink login. In the Remote section, you should find your camera. Clicking it shows the live video feed and you can hear audio too. You can also change your camera settings from this app.

Tried with: DLink DCS-930L (Revision B) camera and Windows 10

attrs package in Python

It is very rare that you learn something that completely changes how you program. Reading this post about the attrs package in Python was a revelation to me.

Coming from C++, I am not too big a fan on returning everything as lists and tuples. In many cases, you want to have structure and attributes and the class in Python is a good fit for this. However, creating a proper class with attributes that has all the necessary basic methods is a pain.

This is where attrs comes in. Add its decorator to the class and designate the attributes of the class using its methods and it will generate all the necessary dunder methods for you. You can also get some nice type checking and default values for the attributes too.

  • First, let us get the biggest confusion about this package out of the way! It is called attrs when you install it cause there is already another existing package called attr (the singular). But when you import and use it, then it is called attr. I know it is irritating, but this is the way it is.

  • To install it:

$ sudo pip3 install attrs
  • To decorate the class use attr.s. I read it is as the plural attrs. And to declare the class attributes, use attr.ib method. I read it as attribute.
class Creature:
    eyes = attr.ib()
    legs = attr.ib()
  • Once declared like this, the attributes can be provided while constructing an object of the class:
c = Creature(2, 4)
  • Object of this class can be constructed using keywords too:
c = Creature(legs=6, eyes=1000)
  • Notice that we have not specified any default value for the attributes. So, it will rightfully complain when constructing without values:
c = Creature()

TypeError: __init__() missing 2 required positional arguments: 'eyes' and 'legs'
  • Default values can be specified for attributes:
class Creature:
    eyes = attr.ib(default=2)
    legs = attr.ib(default=6)

c = Creature()

Note that if there are some rules you run up against if you provide default values for some attributes and not to others.

  • A beautiful __repr__ dunder method is automatically generated for your class. So, you can print any object:
c = Creature(3, 6)

Creature(eyes=3, legs=6)

This is for me the killer feature! This is far more informational than just looking at a bunch of list or dict values.

  • Attributes can be get or set just like normal class attributes:
c = Creature(2, 4)
c.eyes = 10
  • Comparison methods are already generated for you, so you can go ahead and compare objects:
c1 = Creature(2, 4)
c2 = Creature(3, 9)
c1 == c2
  • You can add some semblance of type checking to attributes by using the instance_of validators provided by the package:
class Creature:
    eyes = attr.ib(validator=attr.validators.instance_of(int))
    legs = attr.ib()

c = Creature(3.14, 6)

TypeError: ("'eyes' must be <class 'int'> (got 3.14 that is a <class 'float'>)."
  • By default, class attributes are stored in a dictionary. You can switch this to use slots by changing the decorator:
class Creature:
    eyes = attr.ib()
    legs = attr.ib()
  • Are you curious to see the definition of the dunder methods it generates? You can do that using the inspect package:
import inspect
  • Want to see what are all the methods and fields the package creates for a class?

(Attribute(name='eyes', default=NOTHING, validator=<instance_of validator for type <class 'int'>>, repr=True, cmp=True, hash=True, init=True, convert=None), Attribute(name='legs', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

There is a lot more stuff in this awesome must-use package that can be read here

Tried with: attrs 16.1.0, Python 3.5.2 and Ubuntu 16.04

How to debug running Python program using PyCharm debugger

PDB is a fantastic debugger for Python, but it cannot be easily attached to an already running Python program. The recommended method to attach to a running Python program for debugging is GDB as described here. But, examining stack trace of a Python program and Python objects in a C++ debugger like GDB is not straightforward.

I recently discovered that the GUI debugger in PyCharm IDE can be used to attach to a running Python program and debug it. It is easy to do this:

  • An already running program: Let us assume that I already have a running Python program whose source files are all inside a /home/joe/foobar directory. It has been running an important task for hours now and I have discovered a tiny bug that can be fixed in the running program by changing the value of a global variable.
  • Enable ptrace of any process: For this type of live debugging, we need any process to be able to ptrace any other process. However, the kernel in your distribution may be setup to only allow ptrace of a child process by a parent process. Check that the value of /proc/sys/kernel/yama/ptrace_scope is 0. If not, set it temporarily to 0:
$ echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
  • Install PyCharm: Download PyCharm and unzip the downloaded file. I use the Community Edition which is free.
  • Run PyCharm: Run bin/pycharm.sh and open the directory containing the source files of the running program.
  • If necessary, set the Python interpreter for this project to be the same as that of the running program. That is, we make sure they both use the same version of Python.
  • In the source files, set one or more breakpoints where you would like to stop, inspect or change the running program.
  • Attach: Now we are ready to attach to our running program! Choose Run → Attach to local process and choose the PID of our already running program from the list.
  • Debug: Once attached, the program should stop at our breakpoints. We can now step through the program and change the value of variables to effect some live bug fixes! Once done, we can disable the breakpoints and allow the program to continue by itself.

Tried with: PyCharm 2016.2, Python 2.7.11 and Ubuntu 16.04