How to deal with YAML in Python

YAML (Yet Another Markup Language) is a language similar to JSON for reading and writing configuration information to files that are human readable. YAML is a superset of JSON. It uses indentation instead of the braces used by JSON.

  • To be able to deal with YAML in Python, install the PyYAML package:
$ sudo pip install PyYAML
$ sudo pip3 install PyYAML
  • Similar to JSON, YAML file can be directly loaded into a Python list or dict, depending on whether the root structure of the file is a list or a dict:
import yaml
y = yaml.load(open("foobar.yaml"))
  • Writing a Python structure back to a YAML is similarly straightforward:
yaml.dump(y, open("foobar.yaml", "w"))
  • Note that the YAML file is written in flow style by default. This makes it look a bit like JSON. For human readability, it might be better to dump in block style, like this:
yaml.dump(y, open("foobar.yaml", "w"), default_flow_style=False)

Tried with: PyYAML 3.11, Python 3.5.2 and Ubuntu 16.04

Advertisements

Size of argument and environment lists exceeds limit

Problem

If you try to cp or mv a large number of files or directories, you will encounter this error:

$ cd big_dir
$ cp * ../../some_other_dir
Failed to execute process '/bin/cp'. Reason:
The total size of the argument and environment lists 3.2MB exceeds the operating system limit of 2MB.
Try running the command again with fewer arguments.

Note that you can copy or move one directory that contains a million files or any number of files for that matter. This error happens only if the shell has to pass a large number of input arguments to the cp or mv programs. So, if you run the above command from inside a directory containing 100K files, you will surely get this error.

This is because there is a limit to the size of the arguments and environment strings that can be passed to a program. That limit can be queried:

$ getconf ARG_MAX
2097152

The result will vary on different computers. But they always have a limit and it is encoded in the Linux kernel. There does not seem to be an userspace method to increase this size.

Solution

Instead of looking to increase the ARG_MAX size, examine the real problem. Why make the shell expand all the filenames and pass them as one gigantic list of strings to the programs? Instead these alternate solutions can be tried:

  • See if you can instead cp or mv a parent directory, instead of a million files.

  • Move the files one by one by writing a loop in shell script or Python.

  • Use other programs like rsync to copy a directory to the destination.

Reference: Search for ARG_MAX in the execve(2) manpage

Tried with: Ubuntu 16.04

How to use Bluetooth headphones with Linux

Configuring Bluetooth headphones

One of the irritating problems with Linux in general is Bluetooth. Below are the steps I had to follow to pair and use a Creative WP-300 Bluetooth headphones with Kubuntu 16.04. The procedure should be similar for any Bluetooth headphones or speaker and with any other variant of Ubuntu.

  • Make sure that your computer has a Bluetooth adapter and it is working. Do not just assume there is an adapter and its working in Linux! You should be able to see a Bluetooth icon in the system panel or system tray. In the Bluetooth settings you should be able to see the adapter.

  • Power on your Bluetooth headphones or speaker and put it into pairing mode. Refer to its documentation if you do not know how to do this.

  • Go to the Bluetooth settings in Linux and try to find device and once the device is listed, pair with it and connect to it.

  • Go to the audio settings and you should be able to see the Bluetooth device listed there. If it is not there, you might need to restart Linux. I know it is crazy, I had to do this, since no other solution offered online worked for me!

  • If your device is listed in audio settings then try to play some music or video and see if it plays in the device. Most probably, it will still play on the default speaker of your computer! Go to System Settings → Multimedia → Audio and Video → Device Preference → Audio Playback. Your Bluetooth device must be listed here. Grab it and move it above the built-in audio device. This tells Linux to use this as the default audio output.

Now you should be able to playback audio to your Bluetooth device 👍

Aligned memory allocation

In some scenarios, you want to get memory that is aligned at an address that is a certain power of 2. Certain CPU architectures and certain operations require (or are faster) if their operands are located at an address that is a multiple of a certain power-of-2 number. For these reasons, you might see that many multi-platform libraries use an aligned memory allocator instead of malloc in their code. For example, OpenCV uses methods named fastMalloc and fastFree inside its code that do this type of allocation and freeing.

Most of these methods work like this:

  • They internally get memory from malloc. However, if you requested for N bytes, the wrapper will request for N+P+A bytes from malloc. Here, P is the size of a pointer on that CPU architecture and A is the alignment required, expressed in power-of-2 number of bytes. For example, if I request for 100 bytes on a 64-bit CPU and require the memory to be aligned to a multiple of 32, then the wrapper will request for 140 bytes.

  • After getting the memory from malloc, it aligns the pointer forward so that (1) the pointer is at an address that is aligned as per requirement and (2) there is space behind the pointer to store a memory address.

  • Then we sneak and store the address actually returned by malloc behind the pointer address and return the pointer to the user.

  • The user has to use our free wrapper to free this pointer. When she does that we sneak back to reveal the actual address returned by malloc and free using that.

Here is some example code that illustrates aligned memory allocation:

isinstance and issubclass in Python

isinstance

Use this built-in function to find out if a given object is an instance of a certain class or any of its subclasses. You can even pass a tuple of classes to be checked for the object.

The only gotcha you will discover is this: in Python a bool object is an instance of int! Yes, not kidding!

issubclass

This built-in function is similar to isinstance, but to check if a type is an instance of a class or any of its subclasses.

Again, the only gotcha is that bool type is subclass of int type.

How to convert Python dict to class object with fields

A Python dict is extremely versatile. However, there might be situations where you want to access the dict as if it were a class object and the keys were its fields. This can be easily done by using a namedtuple. Just give it a name and the use the keys to populate its named fields. Set the values for those fields by passing the values from the dict. It all boils down to a single line.

This example code demonstrates the above:

pgrep and pkill

pgrep and pkill are two useful commands that go well together. You can list processes using ps and kill them using kill. However, I find it easier to use pgrep and pkill when I want to find and kill a process.

pgrep

  • To list all the PIDs of a user:
$ pgrep --uid joe
$ pgrep -u joe
  • To list the PID and the corresponding process name for a user:
$ pgrep --uid joe --list-name
$ pgrep -u joe -l
  • To list the PID and the corresponding full command line for a user:
$ pgrep --uid joe --list-full
$ pgrep -u joe -a

This is extremely useful because to find Python scripts or commandline arguments to a program that is running as a process.

  • To list the PID whose process name matches input pattern:
$ pgrep foobar
  • To list the PID and process names that match input pattern:
$ pgrep -l foobar
  • To list the PID and command line of processes that match input pattern:
$ pgrep -a foobar

pkill

pkill is used to send a signal to or kill processes by using a pattern that matches a process name or its command line. pkill takes many arguments that are similar to pgrep.

  • To kill all processes of current user that matches input pattern:
$ pkill foobar
  • To kill all processes of current user that matches input pattern in its command line:
$ pkill -f foobar

Tried with: Ubuntu 16.04

Read and write same file using sponge

A common operation I end up doing is reading a text file, processing it using Unix tools and writing back the result to the same filename. However, if you do this you will find that you end up with an empty file! This is because the file is first opened for writing, thus clearing its contents.

sponge is a tiny tool created specially for this common operation. It can be installed easily:

$ sudo apt install moreutils

Here is an example that is wrong and ends up creating an empty file:

$ cat foobar.txt | uniq > foobar.txt

To fix this, add sponge to soak up all the output first and only write to the file at the end:

$ cat foobar.txt | uniq | sponge foobar.txt

Tried with: moreutils 0.57 and Ubuntu 16.04

Process JSON at the shell using jq

jq is an awesome tool to play with JSON data at the commandline.

  • Installing it is easy:
$ sudo apt install jq
  • To format a JSON file so that it looks pretty:
$ cat ugly.json | jq '.'

I have found this extremely useful to format huge JSON files. jq is much faster at this job compared to the json.tool in Python.

  • You can access and print a specific section of the JSON just like in Python. For example:
$ cat foobar.json | jq '.["records"][6]["name"]'

Note the use of single quotes to encapsulate the expression and use of double quotes inside to specify keys in dictionaries.

Tried with: jq 1.5 and Ubuntu 16.04

Compare files at shell using comm

comm

comm is a useful Linux tool to use at the shell. It takes two files with sorted lines as input and displays which lines are unique to each file and which lines are common (intersection) to both. For example, this might be useful when comparing which files are common among two directories and so on. This tool is a part of coreutils package, so it should be available everywhere.

  • To compare two files:
$ comm first.txt second.txt

You will see 3 columns in the output corresponding to lines unique to first file, lines unique to second file and lines common to both.

  • To suppress columns 1 and 2, thus display only columns common to both files:
$ comm -1 -2 first.txt second.txt
  • There are a few other options to this tool which can be read about in the manpage: man comm

Tried with: Ubuntu 16.04