How to find indices of smallest K items


Recently I actually faced a problem, the likes of which you see being asked in interviews. Input is a array of N values that is not sorted. We want this array to be untouched, since it is required in the given order for later use. What we want to find out is the indices of the smallest K values in this unsorted array. The K in this case is very small compared to N.

Solution 1

The straightforward solution is:

  • Make a copy of the array (or vector)
  • Create another vector of indices
  • Apply a std::partial_sort on this pair of vectors with a custom comparison method that only compares the values vector
  • The smallest K values and their indices in the original order will be output in the first K locations of the vectors

The computational complexity of this solution can probably not be beaten. However, the problems with this solution for my particular application were:

  • A copy of the input vector needs to be made since our input vector needs to be unmodified. The input vector can be very large, so this copy is wasteful. Especially since K is known to be very small (assume 3 or 5).
  • Since we require the indices of the smallest K values, an additional vector of same length of input vector needs to be created.

Solution 2

As you can imagine, the ideal solution would have complexity similar to the above solution. But practically it should not require N new storage, but K size storage. In fact, this can be solved using a std::priority_queue. Here is the pseudocode of my solution:

GetSmallestKIndices(input_array, K):
    Create priority queue pq

    Insert first K pairs of items and indices into pq
    (pq will be maintained at size K)

    Iterate over each of the N-K items in input_array
        If item is bigger than top item in pq then

        We have now found item smaller than items in pq
        Pop off the top item from pq
        Insert pair of current item and its index

    What we have left in pq is smallest K values and their indices

It goes without saying that this solution is only ideal for this particular setup where K is very small compared to N.