📅 2016-Feb-23 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ error, matlab, mkl ⬩ 📚 Archive
I ran a MATLAB script that uses a parallel pool of workers. It failed with this error:
The client lost connection to worker 4. This might be due to network problems, or the interactive communicating job might have errored
The corresponding matlab_crash_dump
file had this stack trace starting from mkl.so
:
[ 0] 0x00007f995128a1c5 /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+16769477 mkl_blas_avx_sgemm_mscale+00001253
[ 1] 0x00007f995115ba1c /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+15530524 mkl_blas_avx_xsgemm+00000204
[ 2] 0x00007f99505a1a5c /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+03234396 mkl_blas_xsgemm+00000316
[ 3] 0x00007f9950529720 /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+02742048
[ 4] 0x00007f9950525d5a /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+02727258 mkl_blas_sgemm+00001386
[ 5] 0x00007f99503b8ba5 /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+01231781 sgemm+00000377
[ 6] 0x00007f991b803154 /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so+01315156 cblas_sgemm+00000372
To run a parallel pool of workers, we need to use the libmkl_rt.so
provided by Intel. I added the path of this library path, which was /opt/intel/mkl/lib/intel64
in my case, to my LD_LIBRARY_PATH
. I also set the BLAS_VERSION
environment variable to libmkl_rt.so
.
After this I checked if everything worked fine by going to Parallel -> Manage cluster profiles -> Local -> Validation profiles -> Validate. All the parallel test tasks were validated successfully. My script too worked fine after this.
Tried with: MATLAB R2014a and Ubuntu 14.04