How to set encoder format for Python JSON

Python’s JSON module makes it very easy to dump data into a JSON file. However, I have found that the float values are encoded with a lot of decimal places or in scientific notation. There is no elegant method to set the formatting in the float encoder of the JSON module. The best solution seems to be to monkey patch its formatting option:

# Make json.dump() encode floats with 5 places of precision
import json
json.encoder.FLOAT_REPR = lambda x: format(x, '.5f')

Reference: https://stackoverflow.com/questions/1447287

Advertisements

Python dict get method

The Python dictionary provides an associate array interface to get the value associated with a key:

>>> d = { 1:"cat", 2:"rat" }
>>> d[1]
'cat'

However, this interface is not very friendly if you lookup a key that does not exist. In such a case, it throws a KeyError exception:

>>> d[3]
KeyError: 3

Python dictionary provides a get method that is safer, it returns a None value if the key is not present:

>>> print(d.get(3))
None

This method is actually cooler than it looks cause you can make it return any default value you want when the key is not present in the dictionary. You do this by passing the default value as the second argument:

>>> print(d.get(1, "elephant"))
cat
>>> print(d.get(3, "elephant"))
elephant

Bonus trick

In many cases, we might have the key in the dictionary, but its value is set to some default value like None or empty string or empty list or empty dict or such values. But at the point we are picking values from keys assume we want such default-valued keys to return a different default value. The trick is that since such default values default to False in Python, we can use that to our advantage.

For example, say the dictionary is already created and not under our control. But, whenever I read values from it, I want elephant if the key does not exist or if the value is a default value that evaluates to False. It gives rise to an elegant Python idiom using get method and or operator:

>>> d = { 1:"cat", 2:"rat", 3:None, 4:"" }
>>> v = d.get(1) or "elephant" ; print(v)
cat
>>> v = d.get(3) or "elephant" ; print(v)
elephant
>>> v = d.get(4) or "elephant" ; print(v)
elephant
>>> v = d.get(99) or "elephant" ; print(v)
elephant

dlopen: cannot load any more object with static TLS

Problem

I had a Python script that used Caffe2. It worked fine on one computer. On another computer with same setup, it would fail at the import caffe2.python line with this error:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: dlopen: cannot load any more object with static TLS
CRITICAL:root:Cannot load caffe2.python. Error: dlopen: cannot load any more object with static TLS

As I mentioned above, the GPU support warning is a red herring cause this Caffe2 Python was built with GPU support. The real error is the dlopen.

Solution

The only solution from Googling that gave a clue was this. As suggested there, I placed the import caffe2.python line at the top above all other imports. The error disappeared.

Tried with: Ubuntu 14.04

How to deal with YAML in Python

YAML (Yet Another Markup Language) is a language similar to JSON for reading and writing configuration information to files that are human readable. YAML is a superset of JSON. It uses indentation instead of the braces used by JSON.

  • To be able to deal with YAML in Python, install the PyYAML package:
$ sudo pip install PyYAML
$ sudo pip3 install PyYAML
  • Similar to JSON, YAML file can be directly loaded into a Python list or dict, depending on whether the root structure of the file is a list or a dict:
import yaml
y = yaml.load(open("foobar.yaml"))
  • Writing a Python structure back to a YAML is similarly straightforward:
yaml.dump(y, open("foobar.yaml", "w"))
  • Note that the YAML file is written in flow style by default. This makes it look a bit like JSON. For human readability, it might be better to dump in block style, like this:
yaml.dump(y, open("foobar.yaml", "w"), default_flow_style=False)

Tried with: PyYAML 3.11, Python 3.5.2 and Ubuntu 16.04

How to convert Python dict to class object with fields

A Python dict is extremely versatile. However, there might be situations where you want to access the dict as if it were a class object and the keys were its fields. This can be easily done by using a namedtuple. Just give it a name and the use the keys to populate its named fields. Set the values for those fields by passing the values from the dict. It all boils down to a single line.

This example code demonstrates the above:

Python JSON dump misses last newline

Problem

The dump method from the Python json package can be used to write a suitable Python object, usually a dictionary or list, to a JSON file. However, I discovered that Unix shell programs have problems working with such a JSON file. This turned out to be because this dump method does not end the last line with a newline character! According to the POSIX definition of a line in a text file, it needs to end with a newline character. (See here).

Solution

I replaced this:

json.dump(json_data, open("foobar.json", "w"), indent=4)

with this:

with open("foobar.json", "w") as json_file:
    json_text = json.dumps(json_data, indent=4)
    json_file.write("{}\n".format(json_text))  # Add newline cause Py does not

Invalid version number error with Python

Problem

I tried to import a Python package that I had installed from source. The import failed with this error:

File "/usr/lib/python2.7/distutils/version.py", line 40, in __init__
  self.parse(vstring)
File "/usr/lib/python2.7/distutils/version.py", line 107, in parse
  raise ValueError, "invalid version number '%s'" % vstring
ValueError: invalid version number '2.7.0rc3'

Solution

It turns out that package version number has to be in the x.y.z format. Else Python throws this error.

Since I had the source code of this package, I found all instances of 2.7.0rc3 and changed it to 2.7.0. Typically, this will be in the setup.py and version.py files. I removed the previously installed package and reinstalled this changed source code. I was able to import after this successfully.

Tried with: Ubuntu 14.04

CMake error building with Python libraries

Problem

I got this error from CMake when building a project that needs to link with Python 3.4 libraries:

-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.4.3", minimum required is "3.0")
-- Could NOT find PythonLibs (missing:  PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS) (Required is at least version "3.0")

Solution

Turns out that the CMake available on my system only supported finding Python 3 packages upto version 3.3. To change it to support Python 3.4 was possible by editing two files:

  • In file /usr/share/cmake-3.4/Modules/FindPythonInterp.cmake find the line containing _PYTHON3_VERSIONS and prepend 3.4 to the versions already listed there.

  • In file /usr/share/cmake-3.4/Modules/FindPythonLibs.cmake find the line containing _PYTHON3_VERSIONS and prepend 3.4 to the versions already listed there.

I was able to build with Python 3.x libraries after that.

Error building Caffe with Python 3 support

Caffe can be built with support for Python 2.x. This allows you to invoke Caffe easily from Python code. However, I wanted to call Caffe from Python 3.x code.

  • I built Boost with Python 3.x support. I could see that libboost_python3 library files were generated.

  • I added this to the normal CMake command that I use to build Caffe: -Dpython_version=3

Sadly, this popped up errors of this type:

libboost_python.so: undefined reference to `PyClass_Type'

This type of error indicates that the Python 2.x Boost library was being used to compile with Python 3.x libraries.

attrs package in Python

It is very rare that you learn something that completely changes how you program. Reading this post about the attrs package in Python was a revelation to me.

Coming from C++, I am not too big a fan on returning everything as lists and tuples. In many cases, you want to have structure and attributes and the class in Python is a good fit for this. However, creating a proper class with attributes that has all the necessary basic methods is a pain.

This is where attrs comes in. Add its decorator to the class and designate the attributes of the class using its methods and it will generate all the necessary dunder methods for you. You can also get some nice type checking and default values for the attributes too.

  • First, let us get the biggest confusion about this package out of the way! It is called attrs when you install it cause there is already another existing package called attr (the singular). But when you import and use it, then it is called attr. I know it is irritating, but this is the way it is.

  • To install it:

$ sudo pip3 install attrs
  • To decorate the class use attr.s. I read it is as the plural attrs. And to declare the class attributes, use attr.ib method. I read it as attribute.
@attr.s
class Creature:
    eyes = attr.ib()
    legs = attr.ib()
  • Once declared like this, the attributes can be provided while constructing an object of the class:
c = Creature(2, 4)
  • Object of this class can be constructed using keywords too:
c = Creature(legs=6, eyes=1000)
  • Notice that we have not specified any default value for the attributes. So, it will rightfully complain when constructing without values:
c = Creature()

TypeError: __init__() missing 2 required positional arguments: 'eyes' and 'legs'
  • Default values can be specified for attributes:
@attr.s
class Creature:
    eyes = attr.ib(default=2)
    legs = attr.ib(default=6)

c = Creature()

Note that if there are some rules you run up against if you provide default values for some attributes and not to others.

  • A beautiful __repr__ dunder method is automatically generated for your class. So, you can print any object:
c = Creature(3, 6)
print(c)

Creature(eyes=3, legs=6)

This is for me the killer feature! This is far more informational than just looking at a bunch of list or dict values.

  • Attributes can be get or set just like normal class attributes:
c = Creature(2, 4)
c.eyes = 10
print(c.legs)
  • Comparison methods are already generated for you, so you can go ahead and compare objects:
c1 = Creature(2, 4)
c2 = Creature(3, 9)
c1 == c2
  • You can add some semblance of type checking to attributes by using the instance_of validators provided by the package:
@attr.s
class Creature:
    eyes = attr.ib(validator=attr.validators.instance_of(int))
    legs = attr.ib()

c = Creature(3.14, 6)

TypeError: ("'eyes' must be <class 'int'> (got 3.14 that is a <class 'float'>)."
  • By default, class attributes are stored in a dictionary. You can switch this to use slots by changing the decorator:
@attr.s(slots=True)
class Creature:
    eyes = attr.ib()
    legs = attr.ib()
  • Are you curious to see the definition of the dunder methods it generates? You can do that using the inspect package:
import inspect
print(inspect.getsource(Creature.__init__))
print(inspect.getsource(Creature.__eq__))
print(inspect.getsource(Creature.__gt__))
  • Want to see what are all the methods and fields the package creates for a class?
print(attr.fields(Creature))

(Attribute(name='eyes', default=NOTHING, validator=<instance_of validator for type <class 'int'>>, repr=True, cmp=True, hash=True, init=True, convert=None), Attribute(name='legs', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

There is a lot more stuff in this awesome must-use package that can be read here

Tried with: attrs 16.1.0, Python 3.5.2 and Ubuntu 16.04