📅 2010-Oct-21 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ cpp, performance ⬩ 📚 Archive
The C++ programmer has to be constantly aware of the performance of the code he is writing. One conundrum that arises often is with the output of a function. If this output is an object of substantial size, is it better to return it as the return value or as an output parameter?
Question: Which of these functions is faster?
Foo Function1();
void Function2( Foo& );
Answer: It depends. Typically it is Function2, sometimes both are comparable.
Function1 looks elegant, both for a programmer and for a mathematician. However, it breaks down as soon as one wants to return multiple values from a function. One could use modern C++ constructs like pair or tuple, but that is extra machinery.
Function2 does not look as elegant as Function1. One needs to be careful here to pass the output parameter as a reference (or pointer) parameter. However, this format can handle multiple output parameters easily.
Returning built-in types (like integer) or tiny objects are easy for the C++ compiler to optimize. They should be equally fast with both Function1 and Function2.
Function1 typically needs to construct the object it is returning. Depending on the class of the object, this might take substantial amount of time. After construction, it may need to modify or fill the object with data. cpp Foo Function1() { Foo foo; // Construction cost foo.fillSomething(); return foo; }
Function2 assumes that the parameter object is already created. It only modifies or fills the object with data.
void Function1( Foo& foo)
{
foo.fillSomething();
return;
}
The performance also depends on what is happening in the caller. The caller might be assigning the return value to initialize a new object or to an already existing object.
int main()
{
// New object
Foo f1 = Function1();
Foo f2;
Function2( f2 );
// Existing object
Foo f;
while ( x )
{
// Do something
f = Function1();
// Do something
f.clear();
Function2( f );
}
return 0;
}
Function1 involves one object creation inside the function and one copy of this object to the destination object. That looks very expensive! However, all modern C++ compilers perform Return Value Optimization (RVO) that eliminates the expensive copy operation.
Note that RVO cannot always be applied by the compiler. When this happens, Function1 becomes all the more expensive.
I conducted a simple test using the STL vector as the return object. The code used for this is below:
#include <ctime>
#include <iostream>
#include <vector>
using namespace std;
typedef vector<int> IntVec;
const int MAX = 10000;
IntVec Function1( const int size )
{
IntVec ivec;
for ( int i = 0; i < size; ++i )
ivec.push_back( i );
return ivec;
}
void Function2( const int size, IntVec& ivec )
{
for ( int i = 0; i < size; ++i )
ivec.push_back( i );
return;
}
int main()
{
// *** New Object
double time1 = 0.0;
double time2 = 0.0;
for ( int i = 0; i < MAX; ++i )
{
const int randVal = rand();
// *** Function1
clock_t begin1 = clock();
IntVec ivec1 = Function1( randVal );
clock_t end1 = clock();
double timeSec1 = (end1 - begin1) / static_cast<double>( CLOCKS_PER_SEC );
time1 += timeSec1;
// *** Function2
clock_t begin2 = clock();
IntVec ivec2;
Function2( randVal, ivec2 );
clock_t end2 = clock();
double timeSec2 = (end2 - begin2) / static_cast<double>( CLOCKS_PER_SEC );
time2 += timeSec2;
}
cout << "New object ..." << endl;
cout << "Function1: " << time1 << endl;
cout << "Function2: " << time2 << endl;
// *** Existing Object
time1 = 0.0;
time2 = 0.0;
IntVec ivec3;
for ( int i = 0; i < MAX; ++i )
{
const int randVal = rand();
// *** Function1
ivec3.clear();
clock_t begin1 = clock();
ivec3 = Function1( randVal );
clock_t end1 = clock();
double timeSec1 = (end1 - begin1) / static_cast<double>( CLOCKS_PER_SEC );
time1 += timeSec1;
// *** Function2
ivec3.clear();
clock_t begin2 = clock();
Function2( randVal, ivec3 );
clock_t end2 = clock();
double timeSec2 = (end2 - begin2) / static_cast<double>( CLOCKS_PER_SEC );
time2 += timeSec2;
}
cout << "Existing object ..." << endl;
cout << "Function1: " << time1 << endl;
cout << "Function2: " << time2 << endl;
return 0;
}
The code was compiled using the Visual C++ 2010 compiler (16.00.30319.01) in Release mode.
New object ...
Function1: 1.753
Function2: 1.615
Existing object ...
Function1: 1.671
Function2: 1.035
The results clearly show that output parameter is always faster. When assigning to a new object, the speed difference is negligible. However, when reusing an existing object, the speed difference is substantial! 😊