How to discover type hierarchy in Python

Given any type in Python, you can easily discover its ancestor and descendant types. This ease of discovery of the internals of the language is one of my favorite features of Python.

  • Remember that all types are descended from the object type.

  • Even type is a type and it is a child of the object type.

  • The __base__ attribute of any type has a string value with the name of the parent type.

  • The __subclasses__ method of any type lists the child types.

  • To determine which are the standard types (or builtin types or builtins as they are called in Python), check the __module__ attribute of the type. If it is builtins in Python 3 or __builtin__ in Python 2, then that is a standard type.

  • If you start from object, you can actually list the entire type hierarchy tree. A script that does just that can be found here.

  • In Python 3.5.2, I found that there are 143 builtin types (most of them are just types of Exception) in the tree:

object
+-- type
+-- dict_values
    +-- odict_values
+-- tuple_iterator
+-- set
+-- fieldnameiterator
+-- frame
+-- dict_keyiterator
+-- PyCapsule
+-- coroutine
+-- bytearray
+-- NoneType
+-- list
+-- dict
+-- getset_descriptor
+-- method-wrapper
+-- method
+-- str_iterator
+-- formatteriterator
+-- str
+-- set_iterator
+-- range_iterator
+-- memoryview
+-- cell
+-- generator
+-- map
+-- list_iterator
+-- stderrprinter
+-- reversed
+-- method_descriptor
+-- code
+-- weakproxy
+-- int
    +-- bool
+-- ellipsis
+-- module
+-- dict_items
    +-- odict_items
+-- bytearray_iterator
+-- Struct
+-- moduledef
+-- filter
+-- staticmethod
+-- tuple
+-- frozenset
+-- managedbuffer
+-- coroutine_wrapper
+-- function
+-- builtin_function_or_method
+-- odict_iterator
+-- float
+-- range
+-- super
+-- dict_keys
    +-- odict_keys
+-- list_reverseiterator
+-- bytes_iterator
+-- member_descriptor
+-- wrapper_descriptor
+-- property
+-- instancemethod
+-- zip
+-- weakref
+-- slice
+-- longrange_iterator
+-- dict_valueiterator
+-- EncodingMap
+-- callable_iterator
+-- mappingproxy
+-- BaseException
    +-- Exception
        +-- TypeError
        +-- StopAsyncIteration
        +-- SyntaxError
            +-- IndentationError
                +-- TabError
        +-- AttributeError
        +-- AssertionError
        +-- StopIteration
        +-- MemoryError
        +-- BufferError
        +-- NameError
            +-- UnboundLocalError
        +-- LookupError
            +-- IndexError
            +-- KeyError
        +-- EOFError
        +-- ImportError
        +-- ValueError
            +-- UnicodeError
                +-- UnicodeEncodeError
                +-- UnicodeDecodeError
                +-- UnicodeTranslateError
        +-- RuntimeError
            +-- RecursionError
            +-- NotImplementedError
        +-- SystemError
        +-- Warning
            +-- UserWarning
            +-- DeprecationWarning
            +-- BytesWarning
            +-- SyntaxWarning
            +-- PendingDeprecationWarning
            +-- FutureWarning
            +-- ResourceWarning
            +-- ImportWarning
            +-- RuntimeWarning
            +-- UnicodeWarning
        +-- ReferenceError
        +-- OSError
            +-- ConnectionError
                +-- BrokenPipeError
                +-- ConnectionAbortedError
                +-- ConnectionRefusedError
                +-- ConnectionResetError
            +-- BlockingIOError
            +-- NotADirectoryError
            +-- PermissionError
            +-- FileExistsError
            +-- TimeoutError
            +-- IsADirectoryError
            +-- InterruptedError
            +-- ProcessLookupError
            +-- FileNotFoundError
            +-- ChildProcessError
        +-- ArithmeticError
            +-- FloatingPointError
            +-- OverflowError
            +-- ZeroDivisionError
    +-- GeneratorExit
    +-- KeyboardInterrupt
    +-- SystemExit
+-- dict_itemiterator
+-- classmethod
+-- NotImplementedType
+-- iterator
+-- bytes
+-- enumerate
+-- classmethod_descriptor
+-- complex
+-- traceback
+-- weakcallableproxy
  • Note how bool is a child type of the int type.

  • In Python 2.7.12, I found that there are 60 builtin types in the tree:

object
+-- type
+-- weakref
+-- weakcallableproxy
+-- weakproxy
+-- int
    +-- bool
+-- basestring
    +-- str
    +-- unicode
+-- bytearray
+-- list
+-- NoneType
+-- NotImplementedType
+-- traceback
+-- super
+-- xrange
+-- dict
+-- set
+-- slice
+-- staticmethod
+-- complex
+-- float
+-- buffer
+-- long
+-- frozenset
+-- property
+-- memoryview
+-- tuple
+-- enumerate
+-- reversed
+-- code
+-- frame
+-- builtin_function_or_method
+-- instancemethod
+-- function
+-- classobj
+-- dictproxy
+-- generator
+-- getset_descriptor
+-- wrapper_descriptor
+-- instance
+-- ellipsis
+-- member_descriptor
+-- file
+-- PyCapsule
+-- cell
+-- callable-iterator
+-- iterator
+-- EncodingMap
+-- fieldnameiterator
+-- formatteriterator
+-- module
+-- classmethod
+-- dict_keys
+-- dict_items
+-- dict_values
+-- deque_iterator
+-- deque_reverse_iterator
+-- Struct
  • Note how str and unicode are child types of the basestring type. Also observe how this differs from Python 3 builtin types.

  • Also notice how in Python 2 the exception types are not builtin types.

Advertisements

How to read YAML file in Python with ordered keys

It is very easy to read a YAML file in Python as a combination of dict and lists using PyYAML. However, the YAML format does not require PyYAML to read the keys of any dict in the YAML file to be read in the order it appears in the file. In addition, Python dict also does not have any order to the keys in it. However, in certain situations it might be necessary to read the keys in YAML in the order they appear in the file. This can be done by using the yamlordereddictloader.

  • Installing this Python package is easy using pip:
$ sudo pip install yamlordereddictloader
  • Read YAML files by providing the loader from this package to PyYAML:
import yaml
import yamlordereddictloader

with open("foobar.yaml") as f:
    yaml_data = yaml.load(f, Loader=yamlordereddictloader.Loader)

This returns the data in the YAML file as a combination of lists and OrderedDict (instead of dict). So, almost all of the rest of the your code should work the same as before after this change.

Tried with: yamlordereddictloader 0.4 and Ubuntu 16.04

How to convert datetime to and from ISO 8601 string

ISO 8601 is a standardized format for representing date and time that is popular. Python has built-in support to convert to and from this format. But confusingly, those methods are distributed across two different modules!

  • Convert a datetime object to string in ISO 8601 format:
import datetime
datetime_str = some_datetime_obj.isoformat()
  • Convert a ISO 8601 format string to datetime object:
import dateutil.parser
some_datetime_obj = dateutil.parser.parse(datetime_str)

How to set encoder format for Python JSON

Python’s JSON module makes it very easy to dump data into a JSON file. However, I have found that the float values are encoded with a lot of decimal places or in scientific notation. There is no elegant method to set the formatting in the float encoder of the JSON module. The best solution seems to be to monkey patch its formatting option:

# Make json.dump() encode floats with 5 places of precision
import json
json.encoder.FLOAT_REPR = lambda x: format(x, '.5f')

Reference: https://stackoverflow.com/questions/1447287

Python dict get method

The Python dictionary provides an associate array interface to get the value associated with a key:

>>> d = { 1:"cat", 2:"rat" }
>>> d[1]
'cat'

However, this interface is not very friendly if you lookup a key that does not exist. In such a case, it throws a KeyError exception:

>>> d[3]
KeyError: 3

Python dictionary provides a get method that is safer, it returns a None value if the key is not present:

>>> print(d.get(3))
None

This method is actually cooler than it looks cause you can make it return any default value you want when the key is not present in the dictionary. You do this by passing the default value as the second argument:

>>> print(d.get(1, "elephant"))
cat
>>> print(d.get(3, "elephant"))
elephant

Bonus trick

In many cases, we might have the key in the dictionary, but its value is set to some default value like None or empty string or empty list or empty dict or such values. But at the point we are picking values from keys assume we want such default-valued keys to return a different default value. The trick is that since such default values default to False in Python, we can use that to our advantage.

For example, say the dictionary is already created and not under our control. But, whenever I read values from it, I want elephant if the key does not exist or if the value is a default value that evaluates to False. It gives rise to an elegant Python idiom using get method and or operator:

>>> d = { 1:"cat", 2:"rat", 3:None, 4:"" }
>>> v = d.get(1) or "elephant" ; print(v)
cat
>>> v = d.get(3) or "elephant" ; print(v)
elephant
>>> v = d.get(4) or "elephant" ; print(v)
elephant
>>> v = d.get(99) or "elephant" ; print(v)
elephant

dlopen: cannot load any more object with static TLS

Problem

I had a Python script that used Caffe2. It worked fine on one computer. On another computer with same setup, it would fail at the import caffe2.python line with this error:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: dlopen: cannot load any more object with static TLS
CRITICAL:root:Cannot load caffe2.python. Error: dlopen: cannot load any more object with static TLS

As I mentioned above, the GPU support warning is a red herring cause this Caffe2 Python was built with GPU support. The real error is the dlopen.

Solution

The only solution from Googling that gave a clue was this. As suggested there, I placed the import caffe2.python line at the top above all other imports. The error disappeared.

Tried with: Ubuntu 14.04

How to deal with YAML in Python

YAML (Yet Another Markup Language) is a language similar to JSON for reading and writing configuration information to files that are human readable. YAML is a superset of JSON. It uses indentation instead of the braces used by JSON.

  • To be able to deal with YAML in Python, install the PyYAML package:
$ sudo pip install PyYAML
$ sudo pip3 install PyYAML
  • Similar to JSON, YAML file can be directly loaded into a Python list or dict, depending on whether the root structure of the file is a list or a dict:
import yaml
y = yaml.load(open("foobar.yaml"))
  • Writing a Python structure back to a YAML is similarly straightforward:
yaml.dump(y, open("foobar.yaml", "w"))
  • Note that the YAML file is written in flow style by default. This makes it look a bit like JSON. For human readability, it might be better to dump in block style, like this:
yaml.dump(y, open("foobar.yaml", "w"), default_flow_style=False)

Tried with: PyYAML 3.11, Python 3.5.2 and Ubuntu 16.04

How to convert Python dict to class object with fields

A Python dict is extremely versatile. However, there might be situations where you want to access the dict as if it were a class object and the keys were its fields. This can be easily done by using a namedtuple. Just give it a name and the use the keys to populate its named fields. Set the values for those fields by passing the values from the dict. It all boils down to a single line.

This example code demonstrates the above:

Python JSON dump misses last newline

Problem

The dump method from the Python json package can be used to write a suitable Python object, usually a dictionary or list, to a JSON file. However, I discovered that Unix shell programs have problems working with such a JSON file. This turned out to be because this dump method does not end the last line with a newline character! According to the POSIX definition of a line in a text file, it needs to end with a newline character. (See here).

Solution

I replaced this:

json.dump(json_data, open("foobar.json", "w"), indent=4)

with this:

with open("foobar.json", "w") as json_file:
    json.dump(json_data, json_file, indent=4)
    json_file.write("\n")  # Add newline cause Py JSON does not