Compare files at shell using comm


comm is a useful Linux tool to use at the shell. It takes two files with sorted lines as input and displays which lines are unique to each file and which lines are common (intersection) to both. For example, this might be useful when comparing which files are common among two directories and so on. This tool is a part of coreutils package, so it should be available everywhere.

  • To compare two files:
$ comm first.txt second.txt

You will see 3 columns in the output corresponding to lines unique to first file, lines unique to second file and lines common to both.

  • To suppress columns 1 and 2, thus display only columns common to both files:
$ comm -1 -2 first.txt second.txt
  • There are a few other options to this tool which can be read about in the manpage: man comm

Tried with: Ubuntu 16.04


C++ STL: Find common elements of 2 sorted vectors

Finding the elements common among two sorted vectors and storing the (common) result in a third vector can be done using the std::set_intersection algorithm as follows:

std::set_intersection(	vec0.begin(), vec0.end(),
						vec1.begin(), vec1.end(),
						vec2.begin()	);

Note that std::set_intersection expects the result vector to be of enough size, i.e. of size max( vec0.size(), vec1.size() ). I find this ugly since the result vector needs to be filled with junk elements and its space is wasted depending on the result of the intersection.

Thankfully, STL has std::back_inserter which can handle this situation:

std::set_intersection(	vec0.begin(), vec0.end(),
						vec1.begin(), vec1.end(),
						std::back_inserter( vec2 )	);

The std::back_inserter acts as an iterator to std::set_intersection while it uses std::vector.push_back() to insert each new element given by it. So, the resulting vector does not need to be initialized to an appropriate size and after std::set_intersection it has only the result elements and has exactly that size.