How to deal with YAML in Python

YAML (Yet Another Markup Language) is a language similar to JSON for reading and writing configuration information to files that are human readable. YAML is a superset of JSON. It uses indentation instead of the braces used by JSON.

  • To be able to deal with YAML in Python, install the PyYAML package:
$ sudo pip install PyYAML
$ sudo pip3 install PyYAML
  • Similar to JSON, YAML file can be directly loaded into a Python list or dict, depending on whether the root structure of the file is a list or a dict:
import yaml
y = yaml.load(open("foobar.yaml"))
  • Writing a Python structure back to a YAML is similarly straightforward:
yaml.dump(y, open("foobar.yaml", "w"))
  • Note that the YAML file is written in flow style by default. This makes it look a bit like JSON. For human readability, it might be better to dump in block style, like this:
yaml.dump(y, open("foobar.yaml", "w"), default_flow_style=False)

Tried with: PyYAML 3.11, Python 3.5.2 and Ubuntu 16.04

Advertisements

Process JSON at the shell using jq

jq is an awesome tool to play with JSON data at the commandline.

  • Installing it is easy:
$ sudo apt install jq
  • To format a JSON file so that it looks pretty:
$ cat ugly.json | jq '.'

I have found this extremely useful to format huge JSON files. jq is much faster at this job compared to the json.tool in Python.

  • You can access and print a specific section of the JSON just like in Python. For example:
$ cat foobar.json | jq '.["records"][6]["name"]'

Note the use of single quotes to encapsulate the expression and use of double quotes inside to specify keys in dictionaries.

Tried with: jq 1.5 and Ubuntu 16.04

Python JSON dump misses last newline

Problem

The dump method from the Python json package can be used to write a suitable Python object, usually a dictionary or list, to a JSON file. However, I discovered that Unix shell programs have problems working with such a JSON file. This turned out to be because this dump method does not end the last line with a newline character! According to the POSIX definition of a line in a text file, it needs to end with a newline character. (See here).

Solution

I replaced this:

json.dump(json_data, open("foobar.json", "w"), indent=4)

with this:

with open("foobar.json", "w") as json_file:
    json_text = json.dumps(json_data, indent=4)
    json_file.write("{}\n".format(json_text))  # Add newline cause Py does not

IPython Notebook error: Unsupported JSON nbformat

20150701_ipython_error

Problem

I ran the IPython Notebook server in a directory containing a .ipynb notebook file:

$ ipython notebook

It opened the URL http://127.0.0.1:8888 in my browser. The notebook file was listed in the webpage. I clicked it to load it and got this error:

Error loading notebook

Unreadable Notebook: Unsupported JSON
nbformat version 4 (supported version: 3)

Solution

This error is caused due to opening a notebook created with a newer IPython on a computer with an older IPython. On my computer, I was using IPython installed from Ubuntu repositories, which is ancient.

I first uninstalled it:

$ sudo apt remove ipython

I installed the latest IPython from the Python Package Index:

$ sudo pip install ipython

The installation was successful, but on running IPython it complained about many other programs. I installed the libraries it complained about:

$ sudo apt install python-zmq python-jinja2 python-jsonschema

IPython Notebook ran correctly and I was able to open the notebook after this! 🙂

Tried with: IPython 3.2 and Ubuntu 14.04