How to debug running Python program using PyCharm debugger

PDB is a fantastic debugger for Python, but it cannot be easily attached to an already running Python program. The recommended method to attach to a running Python program for debugging is GDB as described here. But, examining stack trace of a Python program and Python objects in a C++ debugger like GDB is not straightforward.

I recently discovered that the GUI debugger in PyCharm IDE can be used to attach to a running Python program and debug it. It is easy to do this:

  • An already running program: Let us assume that I already have a running Python program whose source files are all inside a /home/joe/foobar directory. It has been running an important task for hours now and I have discovered a tiny bug that can be fixed in the running program by changing the value of a global variable.
  • Enable ptrace of any process: For this type of live debugging, we need any process to be able to ptrace any other process. However, the kernel in your distribution may be setup to only allow ptrace of a child process by a parent process. Check that the value of /proc/sys/kernel/yama/ptrace_scope is 0. If not, set it temporarily to 0:
$ echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
  • Install PyCharm: Download PyCharm and unzip the downloaded file. I use the Community Edition which is free.
  • Run PyCharm: Run bin/pycharm.sh and open the directory containing the source files of the running program.
  • If necessary, set the Python interpreter for this project to be the same as that of the running program. That is, we make sure they both use the same version of Python.
  • In the source files, set one or more breakpoints where you would like to stop, inspect or change the running program.
  • Attach: Now we are ready to attach to our running program! Choose Run β†’ Attach to local process and choose the PID of our already running program from the list.
  • Debug: Once attached, the program should stop at our breakpoints. We can now step through the program and change the value of variables to effect some live bug fixes! Once done, we can disable the breakpoints and allow the program to continue by itself.

Tried with: PyCharm 2016.2, Python 2.7.11 and Ubuntu 16.04

How to install Visual Studio Code

Visual Studio Code (VSCode) is easy to install and configure. You just need to remember a few keyboard shortcuts.

  • Download the latest .deb file of VSCode from here.

  • VSCode is being developed rapidly and new releases appear frequently. If you have an older version of VSCode, uninstall it first:

$ sudo apt remove code
  • Install the downloaded VSCode:
$ sudo dpkg -i vscode-you-downloaded.deb
  • Start in a root directory of your code:
$ code .
  • VSCode will start up using all the files and directories inside the current directory as belonging to a project. While it is indexing the files, you will see a red flame icon in bottom-right corner. Once the flame goes away, you are ready to edit files!

  • Extensions: VSCode is no fun without extensions for your favorite languages (say C++ or Python) or your favorite features (say Vim). To install extensions use keyboard shortcut Ctrl+P and type ext install. You can search and install extensions from the VSCode Marketplace. Many extensions might restart VSCode to get enabled.

  • Settings: VSCode can handle both user settings (works across all projects) and workspace settings (works only for a particular project). All settings are set in JSON files. You can choose File β†’ Preferences to open these. To set something, you need to copy that line from default settings to settings.json.

Tried with: Visual Studio Code 1.4 and Ubuntu 16.04

How to install Go

The Go programming language has been getting popular for certain types of server-side programming. Installing the compiler and its tools and using them is not as straightforward as you might assume. Here are the steps I use:

  • Go to the Go language website and download the package.

  • Unzip the downloaded file and move it to /usr/local/go:

$ tar xvf go-package-you-downloaded.tar.gz
$ sudo mv go /usr/local
  • Add /usr/local/go/bin to the PATH variable of your shell.

  • Your Go source code and compiled binaries will reside inside a specific directory. Create it and set an environment variable named GOPATH to it:

  • Your source code has to be put under GOPATH in this directory hierarchy: $GOPATH/src/github.com/someuser/someproject. For example, add the hello.go from this page to $GOPATH/src/github.com/joe/hello/main.go. You will need to create that directory hierarchy. If you are getting some Go code from Github, remember to clone it within that $GOPATH/src/github.com directory.

  • The above source code can be compiled from anywhere using this command:

$ go install github.com/joe/hello

Note how the Go compiler will always look under the $GOPATH/src for the above compilation.

  • After compilation, the binary will be placed at $GOPATH/bin. This directory also needs to be added to the PATH variable. Remember to do this cause when you start using Go code that calls other Go libraries, this is essential for that to work.

  • To run the program you compiled:

$ $GOPATH/bin/hello

Tried with: Go 1.7 and Ubuntu 16.04

Visual Studio Code extensions that I use

  • CPP Tools: The official extension for working with C++ code. Automatically indexes all code in the currently open directory, offers auto-completion and syntax highlighting.

  • Python by Don Jayamanne: There are many Python extensions, but this seems to be the most popular one. Syntax highlighting, indexing and code completion.

  • Vim: There are many Vim extensions, but this seems to be the most popular one. It has entire universes to traverse before it can be as good as Vrapper, the Vim extension for Eclipse. This VSCode extension offers very basic navigation and editing commands.

  • Git Blame: This extension does one little thing that I need everyday to work with code from other people: know who modified a line of code. This extension shows that for the current line in the status bar.

  • Matlab: I need to regularly browse through some MATLAB files. This extension offers syntax highlighting of Matlab files.

Tried with: Visual Studio Code 1.4 and Ubuntu 16.04

How to use PDB Python debugger

When the Python interpreter encounters an error or exceptional condition while running your code, it quits the program and prints a stack trace. I typically would like to inspect the stack at the point of error to figure out the cause of it. It would be nice to print out some of the local variables you suspect at that point, walk up the stack frames to the calling functions and inspect some of their variables too. All of this is easily possible in Python by using the Python debugger module!

  • After you see a Python script fail, just run it again but using the pdb module using any one of these commands:
$ pdb my_script.py

$ python -m pdb my_script.py

$ pdb3 my_script.py

$ python3 -m pdb my_script.py
  • If you do not want to run the PDB module like this, you can also add a hard breakpoint at a line in your code by adding this before that line:
import pdb; pdb.set_trace()
  • When run explicitly, PDB will stop at the first function. This is a good point to set breakpoints using the b command.
  • Press c to continue execution. It will run the script and should fail at the place where it had failed earlier.
  • Press w to inspect the stack trace. Your current frame will be indicated by an arrow. To move up and down the stack, use u and d.
  • To print out the values of variables in the current stack frame, use the p command, type their name and press Enter.
  • For the entire list of commands available at the debugger, see PDB documentation.
  • If you need tab completion, colorful syntax highlighting and other interactive features, then install the IPDB module which builds upon IPython to offer these features. Using that is similar to PDB:
$ ipdb my_script.py

$ python -m ipdb my_script.py

$ ipdb3 my_script.py

$ python3 -m ipdb my_script.py

Happy debugging!πŸ™‚

Tried with: Python 3.5.1 and Ubuntu 16.04

How to get tab completion in PDB

A common technique to debug in Python is to add this line at a place which you want to observe:

import pdb; pdb.set_trace();

When you run the Python code and the interpreter hits this line, it drops you into a Python debugger prompt. You can inspect local variables and step through code from here.

An irritating problem here is that this PDB prompt does not offer tab completion. So, you have to type out entire variable, class and method names by yourself without any aid. Thankfully, here is a method to add tab completion feature to this PDB prompt.

Create a .pdbrc in your home directory and add these lines to it:

# Enable tab completion                                                         
import rlcompleter                                                              
import pdb                                                                      
pdb.Pdb.complete = rlcompleter.Completer(locals()).complete

Tried with: Python 3.5.1 and Ubuntu 16.04

How to edit MKV headers using MKVToolNix

MKV is a popular container format used for video files. Sometimes, I need to change the headers of a file. For example, to set a different audio channel as default or to disable it and so on. All such header operations are easy to perform using the GUI tool of MKVToolNix.

To install the GUI tool:

$ sudo apt install mktoolnix-gui
  • Open mkvtoolnix-gui and click the Edit Headers section.
  • Open the MKV video file whose headers you want to edit. This section will be populated with all the video, audio, subtitle and other such header information. It can be expanded by clicking on the plus symbols.
  • To edit a particular header field, drill down to it and click on it. The right side of the GUI will show its current value. You can add, remove or edit the default values.
  • After you are done, we need to make sure that the resulting header is valid. To check this choose from above Header editor β†’ Validate values.
  • If validation passes, choose from above Header editor β†’ Save.

Tried with: MKVToolNix 8.8.0 and Ubuntu 16.04

How Ubuntu dumps core

When your program tries to do anything nasty, like access memory it is not supposed to, you end up seeing a core dump.

If you ran the program under the Bash shell, you might see this error:

$ ./my_program
Segmentation fault (core dumped)

Under the Fish shell, I see this error:

$ ./my_program
fish: β€œ./my_program” terminated by signal SIGSEGV (Address boundary error)

If I check the directory I ran the program from, I see that there is a core file that has the entire dump of the process memory from when the program did its nasty act.

How did a SIGSEGV end up as a core dump file? This is what I found in Ubuntu:

  • On SIGSEGV, Linux kernel writes an entry in /var/log/syslog with some information about what happened. Here is the line on my computer:
Aug 13 17:56:09 my_machine kernel: [442853.259571] my_program[18432]: segfault at 4700000046 ip 00007f7688f6980d sp 00007ffd46b5e3e0 error 4 in libc-2.23.so[7f7688f30000+1c0000]
  • The Linux kernel checks what settings are present for writing a core dump. There are 3 settings, which are in the files core_pattern, core_pipe_limit and core_uses_pid in /proc/sys/kernel directory. The documentation of these settings can be found described in kernel.txt.

  • core_pattern says what the filename of the core dump file should be. On my Ubuntu, this setting had this value:

$ cat /proc/sys/kernel/core_pattern 
|/usr/share/apport/apport %p %s %c %P

From the kernel.txt documentation, we can see that this means that the kernel has to call the /usr/share/apport/apport executable with four input arguments. The first, second and fourth arguments are documented as PID of offending process, signal number and global PID. We do not know what the %c does, but we will find out soon below. The kernel will pipe the process memory to the standard input of this apport program.

  • What is this apport program? It is the crash handler of Ubuntu. We can see that it is a Python script:
$ file /usr/share/apport/apport
/usr/share/apport/apport: Python script, ASCII text executable

That is great, because we can open it and read its code to see what it does!

  • Opening up this file in Vim, we see that %c input argument is the ulimit value set in our shell:
Line 393: (pid, signum, core_ulimit) = sys.argv[1:4]
  • We can also see what exactly it does to write the core dump in its function write_user_coredump. In short, this function checks the ulimit value of the shell and then decides how many bytes to write to the core file and writes it to the same directory as the program.

  • We also see that it called error_log to write some log messages. These can be found listed in the /var/log/apport.log file:

ERROR: apport (pid 18475) Sat Aug 13 17:56:18 2016: called for pid 18474, signal 11, core limit 102400000
ERROR: apport (pid 18475) Sat Aug 13 17:56:18 2016: executable: /home/joe/my_program (command line "./my_program")
ERROR: apport (pid 18475) Sat Aug 13 17:56:18 2016: executable does not belong to a package, ignoring
ERROR: apport (pid 18475) Sat Aug 13 17:56:18 2016: writing core dump to /home/joe/core (limit: 102400000)

This is how Ubuntu writes the core dump file. You can now open this up in GDB for debugging!πŸ™‚

Tried with: Ubuntu 16.04

How to block domains using hosts file

A popular method used to block ads or domains in browsers is to use adblocking addons. An alternate method is by using the hosts file of your operating system. This file has mappings of hostnames to IP addresses. By pointing domain names to the 0.0.0.0 IP address they cannot be resolved for access.

  • A popular source for hosts file is provided by Steven Black. Clone that repository to your computer:
$ git clone https://github.com/StevenBlack/hosts.git
$ cd hosts
  • If you have any entries in your local hosts file, then add them to the myhosts file in this repository.

  • To update the adblocking domain files to their latest versions and to add the myhosts entries to create a final hosts file run this command:

$ ./makeHosts
  • Replace your hosts file with this one:
$ sudo cp /etc/hosts /etc/hosts.orig
$ sudo cp hosts /etc/hosts
  • Optionally, you can also restart your name service cache daemon to flush its cache:
$ sudo service nscd restart

That is it! You can use any browser or application and the domains in the hosts file will not be accessible from them.

Tried with: Ubuntu 16.04

Merge with squash in Git

Merge is a common operation in Git to merge the changes in another branch to the current branch.

Here is an example, where we checkout the master branch and merge a feature_branch to it:

$ git checkout master
$ git merge feature_branch
$ git commit
Before merge of feature_branch to master
Before merge of feature_branch to master
After merge of feature_branch to master
After merge of feature_branch to master

In the pictorial depictions above we can see that the two branches are actually merged together in the directed acyclic graph (DAG) with an edge. After this, the git log for master will show all the commits that are in feature_branch too in addition to the commits in master.

There might be cases where you do not want to actually merge the two branches. Maybe you just want a single commit on master that has all the changes that would have been merged from feature_branch. Note that this is different from git rebase since that replays all the multiple commits of feature_branch on master.

The answer to this is git merge --squash. This command effectively changes the files such that they would be after a git merge. However, there would be no link between master and feature_branch after this commit is committed.

To merge with squash in the above scenario:

$ git checkout master
$ git merge --squash feature_branch
$ git commit

When you commit you will see that Git inserts a default commit message that says Squashed commit of the following and lists all the commits from feature_branch that are squashed into this commit. You can delete this commit message and create your own of course.

After merge --squash of feature_branch to master
After merge –squash of feature_branch to master

The pictorial depiction of this operation above shows that there is no link between the two branches after this merge. When you do git log no one can see the multiple commits of feature_branch. You can even safely delete feature_branch if you want to after this operation.

Tried with: Git 2.9.0 and Ubuntu 16.04