Caffe CUDA_cublas_device_LIBRARY error

Problem

I was trying to build BVLC Caffe from source as described here on Ubuntu 18.04 with CUDA 10.0.

CMake ended with this error:

Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)

Solution

This problem seems to occur with a combination of CUDA 10.0 or later and a CMake version that is older than 3.13. I upgraded to CMake 3.14.0 as described here and the problem was solved.

Advertisements

How to install CMake

CMake is easy to install in Ubuntu using apt:

$ sudo apt install cmake

However, depending on your version of Ubuntu, the CMake version that is installed might be very old. So, you might run into problems when you build projects that use features from more recent versions of CMake.

CMake provides binary versions for Linux x86_64. Installing the latest version of CMake from these packages is easy:

  • Remove the Ubuntu version of CMake:
$ sudo apt remove cmake cmake-data
  • Download the .sh binary package of the CMake version you want from here. When I downloaded, I got cmake-3.14.0-Linux-x86_64.sh

  • Move the downloaded package to /opt and execute it:

$ sudo mv cmake-3.14.0-Linux-x86_64.sh /opt
$ cd /opt
$ sudo chmod +x cmake-3.14.0-Linux-x86_64.sh
$ sudo bash ./cmake-3.14.0-Linux-x86_64.sh

This installs this version of CMake in the directory /opt/cmake-3.14.0-Linux-x86_64.

  • Create symbolic links for the CMake binaries:
$ sudo ln -s /opt/cmake-3.14.0-Linux-x86_64.sh/bin/* /usr/local/bin

$ Test if CMake is working:

$ cmake --version
cmake version 3.14.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Tried with: Ubuntu 18.04

Vector growth factor in languages

Growth factor is the factor by which the new size of a C++ STL vector is computed when we need to insert beyond its current capacity. There are containers similar to the STL vector in other languages and it is quite interesting to see what factors were chosen in those implementations.

  • The last time I looked into this (see this post), VC++ vector had a growth factor of 1.5 and GCC vector had a growth factor of 2.

  • Java ArrayList seems to have a growth factor of 1.5. See line 240 in ArrayList.java.

  • Julia also seems to have a growth factor of 1.5.

Integer division in C

Integer division in modern C always rounds towards zero.
For example:

9/2 = 4
-9/2 = -4 (not -5)

This was was not always so. In early C versions and in the C89 standard, positive integer division rounded to zero, but result of negative integer division was implementation dependent!

From ยง3.3.5 Multiplicative operators of the C89 standard:

   When integers are divided and the division is inexact, if both
operands are positive the result of the / operator is the largest
integer less than the algebraic quotient and the result of the %
operator is positive.  If either operand is negative, whether the
result of the / operator is the largest integer less than the
algebraic quotient or the smallest integer greater than the algebraic
quotient is implementation-defined, as is the sign of the result of
the % operator.  If the quotient a/b is representable, the expression
(a/b)*b + a%b shall equal a .

The same appears in the first edition of The C Programming Language book by Kernighan and Ritchie.

This implementation-defined behavior was fixed by the C99 standard which defined that integer division always rounds towards zero.

From ยง6.5.5 Multiplicative operators of the C99 standard:

When integers are divided, the result of the / operator is the algebraic
quotient with any fractional part discarded.

Thanks to Arch for pointing this out.

Multiple definitions of inline function

In C++, a function is annotated with inline to implore the compiler to inline that function’s code into its callers. The C++ standard says that injecting multiple differing definitions of an inline function into a program is illegal because it breaks the ODR.

For example, ยง7.1.2 from the C++11 standard says:

An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly the same definition in every case.

So, having multiple differing definitions is illegal but the compiler cannot really stop it. This is because two differing definitions of the same inline method might be inlined into their callers in two different compilation units that are linked together into a program.

Thus, if we use multiple different inline definitions, all bets are off and it is undefined behavior.

This example illustrates the final behavior differs based on just compilation flags:

// x.cpp
#include <iostream>

inline void f()
{
    std::cout << __FILE__ << std::endl;
}

extern void g();

int main()
{
    f();
    g();

    return 0;
}

// y.cpp
#include <iostream>

inline void f()
{
    std::cout << __FILE__ << std::endl;
}

void g()
{
    f();
}

$ g++ x.cpp y.cpp
$ ./a.out
x.cpp
x.cpp

$ g++ -O2 x.cpp y.cpp
$ ./a.out
x.cpp
y.cpp

If we want to control and want each caller to only use its local inline definition, then we can do that by placing the inline method inside an anonymous namespace. The practice of multiple differing inline definitions is still illegal, but at least we now have controlled what happens. This example illustrates this:

// x.cpp
#include <iostream>

namespace
{
inline void f()
{
    std::cout << __FILE__ << std::endl;
}
}

extern void g();

int main()
{
    f();
    g();

    return 0;
}
                                                                                                                                                                                // y.cpp
#include <iostream>

namespace
{
inline void f()
{
    std::cout << __FILE__ << std::endl;
}
}

void g()
{
    f();
}

$ g++ x.cpp y.cpp
$ ./a.out
x.cpp
y.cpp

$ g++ -O2 x.cpp y.cpp
$ ./a.out
x.cpp
y.cpp

Thanks to Arch for the elegant examples.

Programming the 8086/8088

Programming the 8086/8088

Recently I was looking at PCJs, which does a classic IBM PC simulation in your browser. In its Programming Guides list I discovered this book about the classic 8086: Programming the 8086/8088 by James W. Coffron. The first assembly language I learned almost two decades ago was the 8086. The x86 and x86-64 assembly language got so bloated and complicated after that. I picked up this book to relive that feel of a simple processor whose entire architecture and details you could actually comprehend.

This book seems to be meant for anyone with a bit of programming experience. The 8086 and 8088 architecture is introduced first, followed by details of its registers, instructions, memory model, I/O and how to program for it. There is a complete reference of its instructions, which are shockingly few in number for a CISC processor. It was quaint to see a listing of the number of cycles each instruction would take. (The fastest instructions took ~3 cycles and the slowest like integer division took hundreds of cycles.) Notably missing was floating point, which this early processor did not support, you had to use a math co-processor for that.

Studying the book I was reminded of how these Intel processors differed from others. The 8088 was introduced as a cheaper 8086. Both operated at 16-bit with 16-bit registers and a 20-bit address bus. The only difference was that 8088 had a 8-bit data bus and the 8086 had a 16-bit data bus. These processors supported another bygone relic: segmented memory. One of the segment registers holding a segment base address was right-shifted by 4 bits and added to offset address from another register to generate the 20-bit address. A final oddity was that 8086/8088 did not do memory mapped I/O, instead you used IN and OUT instructions to read from IO ports.

The book was pretty easy to follow along and covered both 8086 and 8088 well. Almost all aspects of the processors seem to be covered. What nagged me were several small mistakes that I could notice in the book. Maybe another round of proofreading would have helped. This book was easy, but I am still looking for the perfect 8086 reference.

Rating: 3/4 (โ˜…โ˜…โ˜…โ˜†)

How to install PDFTK

PDFTK is a tool that can be used to split and merge PDF files. It is available for both Linux and Windows.

Windows

PDFTK installer for Windows can be downloaded here. After PDFTK is installed, ensure that the directory of pdftk.exe is in the PATH environment variable.

Ubuntu 18.04 and newer

For Ubuntu versions starting from 18.04 and newer versions, use Snap to install PDFTK:

$ sudo snap install pdftk

Ubuntu 16.04 and older

For Ubuntu versions 16.04 and older versions, use Apt to install PDFTK:

$ sudo apt install pdftk

If your installation is successful, you should be able to run pdftk from the shell.

How to use Vim as plain text editor

Vim is not only useful for code, but also as a general plain text editor. This is useful for writing README files, Markdown files or if you like publishing plain text file content.

In plain text files, a desirable property is to have lines wrapped at a certain width. The general wrap feature of Vim is meaningful only for code which has long lines. This option just wraps the long line visually to the screen width.

To use Vim as a plain text editor you want it to automatically breaks lines at a certain width as you type. You want it to do that only at word boundaries. The option that enables this mode is textwidth. For source code, this feature is disabled and is set to 0.

To enable Vim to set a maximum width for text that is inserted, set this option: :set textwidth=80

If you have existing text that was not set to this width or you have shortened some lines all of that can be fixed too: mark them visually and press gq.

A final feature needed from a plain text editor is to be able to center a line. This is useful for example for titles and section headings. To do that, mark that line visually and type :center. This only works if the textwidth option is set.

Tried with: Vim 8.0 and Ubuntu 18.04

How to use clang-tidy

clang-tidy is a LLVM tool that can be used as a static checker on your C++ codebase and to fix errors it finds. A full list of current checks and their descriptions can be found here. The number of checks available to you will depend on the clang-tidy version you are using.

  • Installing this tool is easy:
$ sudo apt install clang-tidy

This typically installs a stable but older version of the tool with less number of checks.

  • To list the checks it performs by default:
$ clang-tidy -list-checks

On my computer I found it had 80 checks enabled by default.

  • To list all the checks that it can perform:
$ clang-tidy -list-checks -checks="*"

On my computer I found that it had a total of 292 checks.

  • To check a file using all checks enabled by default:
$ clang-tidy foobar.cpp
  • To check a file using all checks enabled by default and fix the errors using suggested fixes:
$ clang-tidy -fix foobar.cpp
$ clang-tidy -fix-errors foobar.cpp
  • To use a specific check, say modernize-use-nullptr:
$ clang-tidy -checks="-*,modernize-use-nullptr" foobar.cpp

Tried with: clang-tidy 6.0.0 and Ubuntu 18.04