I installed PyTorch on my system and the very first import failed with this error:
>>> import torch Traceback (most recent call last): File "<input>", line 1, in <module> import torch File "/usr/lib/python3/dist-packages/bpython/curtsiesfrontend/repl.py", line 251, in load_module module = self.loader.load_module(name) File "/home/joe/.local/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module> from torch._C import * File "/usr/lib/python3/dist-packages/bpython/curtsiesfrontend/repl.py", line 251, in load_module module = self.loader.load_module(name) ImportError: /home/joe/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN3c1031FreeCudaMemoryCallbacksRegistryEv
It was obvious that the problem was CUDA or PyTorch's interface to CUDA.
I checked that
/usr/local/cuda symlink was pointing to my CUDA 10.1 installation. I also made sure that
/usr/local/cuda/lib64 was in my
LD_LIBRARY_PATH. However, this did not fix the problem.
I then started checking which libraries
/home/joe/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so was dynamically loading at runtime using the
ldd command. This investigation revealed that all the dependent libraries were in
/home/joe/.local/lib/python3.6/site-packages, except for one:
/usr/local/lib/libc10_cuda.so. Turns out that this is a library installed by PyTorch and I had no idea why there was an old copy of this file in
/usr/local/lib. It was probably from an old Ubuntu installation of PyTorch that did not uninstall properly.
I removed this file and PyTorch picked up its local
libc10_cuda.so and everything was fine!