Thrust: Remove Duplicates in Multiple Vectors

With the magical thrust::zip_iterator duplicates in multiple vectors can be easily removed and the vectors can be trimmed in Thrust.

Consider two vectors, one of key values and the other holding their values. There can be many values associated with each key. The keys are sorted and the values associated with each key are also sorted. Finding duplicates in these vectors boils down to finding duplicate pairs and removing them. Here is how to achieve this easily using thrust::unique and thrust::zip_iterator:

typedef thrust::device_vector< int >                IntVector;
typedef IntVector::iterator                         IntIterator;
typedef thrust::tuple< IntIterator, IntIterator >   IntIteratorTuple;
typedef thrust::zip_iterator< IntIteratorTuple >    ZipIterator;

IntVector keyVector;
IntVector valVector;

// Remove duplicate pairs
ZipIterator newEnd = thrust::unique( thrust::make_zip_iterator( thrust::make_tuple( keyVector.begin(), valVector.begin() ) ),
                                     thrust::make_zip_iterator( thrust::make_tuple( keyVector.end(), valVector.end() ) ) );

IntIteratorTuple endTuple = newEnd.get_iterator_tuple();

// Trim the vectors
keyVector.erase( thrust::get<0>( endTuple ), keyVector.end() );
valVector.erase( thrust::get<1>( endTuple ), valVector.end() );

Tried with: Thrust 1.3 and CUDA 3.2

C++ STL Vector Growth Factor

The STL vector is a dynamic array. When it grows its internal memory is reallocated and the old content is copied over to the new space. Because of this, when the vector capacity increases the iterators, references and pointers to its elements become invalid.

The capacity increase factor of a vector depends on the STL implementation. The current capacity of a vector can be found by calling its capacity() method. By running a simple program (like below), one can figure out the factor. I found that this factor is 1.5 for Visual C++ 2010, which uses the STL implementation from Dinkumware and it is 2.0 on G++ 3.4.4 running under Cygwin.

#include 
#include 
using namespace std;

int main()
{
	vector ivec;
	for ( int i = 0; i < 1000; ++i )
	{
		const int before = ivec.capacity();
		ivec.push_back( i );
		const int after = ivec.capacity();

		if ( before != after )
			cout << "Capacity: " << after << endl;
	}

	return 0;
}

C++ STL: Find common elements of 2 sorted vectors

Finding the elements common among two sorted vectors and storing the (common) result in a third vector can be done using the std::set_intersection algorithm as follows:

std::set_intersection(	vec0.begin(), vec0.end(),
						vec1.begin(), vec1.end(),
						vec2.begin()	);

Note that std::set_intersection expects the result vector to be of enough size, i.e. of size max( vec0.size(), vec1.size() ). I find this ugly since the result vector needs to be filled with junk elements and its space is wasted depending on the result of the intersection.

Thankfully, STL has std::back_inserter which can handle this situation:

std::set_intersection(	vec0.begin(), vec0.end(),
						vec1.begin(), vec1.end(),
						std::back_inserter( vec2 )	);

The std::back_inserter acts as an iterator to std::set_intersection while it uses std::vector.push_back() to insert each new element given by it. So, the resulting vector does not need to be initialized to an appropriate size and after std::set_intersection it has only the result elements and has exactly that size.

C++ STL: Copy vector to array

The array behaves like a vector and so can be used almost everywhere a vector is used. So, a vector can be copied into an array using std::copy algorithm. But, make sure that the array is big enough to hold the elements of the vector when you do this:

#include <vector>

std::vector<Foo> fooVec;
Foo fooArr[FOO_MAX];

std::copy( fooVec.begin(), fooVec.end(), fooArr );

C++ STL: Output Container to Stream

We can output the elements of any container to any ostream using std::copy like this. But, it would be really nice if we could output any vector like this:

std::vector<MyClass> myVec;
std::cout << myVec;

To achieve that, we just need one more wrapper around std::copy, a function template that accepts vectors of any type:

template <typename T>
std::ostream& operator << (std::ostream& os, const std::vector<T>& vec)
{
    std::copy( vec.begin(), vec.end(), std::ostream_iterator<T>( os ) );
    return os;
}

Similar functions can be written for other STL containers.

C++: Initialize STL Vector with Array

A STL vector does not have any constructor that looks like it accepts an array. So, the straightforward way to initialize a vector with an array looks like:

#include <vector>

int arr[ARR_LEN] = { /* Array elements */ };
std::vector iVec;
for (int i = 0; i < ARR_LEN; ++i)
    iVec.push_back( arr[i] );

That is too much code for a simple initialization! Thankfully, there is another way do it with less code.

The constructor vector(InputIterator begin, InputIterator end) which initializes a vector with the elements of the range [begin, end) can be used with arrays too. This is because array pointers have all the properties required of an (input) iterator:

#include <vector>

int arr[ARR_LEN] = { /* Array elements */ };
std::vector<int> iVec(arr, arr + ARR_LEN);

Note that the second argument arr + ARR_LEN is beyond the last element of the array and thus behaves as the correct end().