Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

MATLAB parallel pool error

📅 2016-Feb-23 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ error, matlab, mkl ⬩ 📚 Archive

Problem

I ran a MATLAB script that uses a parallel pool of workers. It failed with this error:

The client lost connection to worker 4. This might be due to network problems, or the interactive communicating job might have errored

The corresponding matlab_crash_dump file had this stack trace starting from mkl.so:

[  0] 0x00007f995128a1c5        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+16769477 mkl_blas_avx_sgemm_mscale+00001253
[  1] 0x00007f995115ba1c        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+15530524 mkl_blas_avx_xsgemm+00000204
[  2] 0x00007f99505a1a5c        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+03234396 mkl_blas_xsgemm+00000316
[  3] 0x00007f9950529720        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+02742048
[  4] 0x00007f9950525d5a        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+02727258 mkl_blas_sgemm+00001386
[  5] 0x00007f99503b8ba5        /usr/local/MATLAB/R2014a/bin/glnxa64/mkl.so+01231781 sgemm+00000377
[  6] 0x00007f991b803154    /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so+01315156 cblas_sgemm+00000372

Solution

To run a parallel pool of workers, we need to use the libmkl_rt.so provided by Intel. I added the path of this library path, which was /opt/intel/mkl/lib/intel64 in my case, to my LD_LIBRARY_PATH. I also set the BLAS_VERSION environment variable to libmkl_rt.so.

After this I checked if everything worked fine by going to Parallel -> Manage cluster profiles -> Local -> Validation profiles -> Validate. All the parallel test tasks were validated successfully. My script too worked fine after this.

Tried with: MATLAB R2014a and Ubuntu 14.04


© 2023 Ashwin Nanjappa • All writing under CC BY-SA license • 🐘 Mastodon📧 Email