Debugging the TensorFlow / Cuda error on AWS — ImportError: libcublas.so.9.0: cannot open shared object file
Cause of error
This error is caused because mismatch in versions of tensorflow-gpu and CUDA. Every tensorflow-gpu lib is dependent on a very specific CUDA version.
Check our versions
Check tensorflow-gpu version :
pip list | grep tensorflow-gpu
Our tensorflow-gpu version is 1.8.0.
Check CUDA version:
ls -l /usr/local/cuda
Our cuda version is cuda-8.0.
What are the compatible cuda versions for tensorflow :
Let’s refer to official TensorFlow page for the version compatibility.
So required dependencies for our tensorflow-gpu versions are:
So we have 2 options:
- Downgrade tensorflow-gpu to version 1.4.0 to match our system cuda version cuda-8.0
- Upgrade system cuda version to cuda-9.0 to match our tensor flow gpu version 1.8.0.
Also note your Python version if the version of tensorflow gpu supports your python version.
Option 1. Upgrading/Downgrading system Cuda
AWS AMI’s have multiple versions of cuda pre installed on the box and you might have your new cuda version already. So in that case we just need to update the softlink to point to our expected version of cuda. If not you will have to install your new cuda version.
# Look at the current cuda version
ls -l /usr/local/cuda # Look at the required cuda version
ls /usr/local/cuda-9.0 # Remove softlink to current cuda version
sudo rm /usr/local/cuda # Add softlink to new version
ln -s /usr/local/cuda-9.0 /usr/local/cuda
Option 2. Upgrading/Downgrading tensorflow-gpu
pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.4.0
pip list | grep tensorflow
Sometimes a notebook restart is required. The most important thing is to match the expected version dependencies.
Originally published at https://confusedcoders.com on June 14, 2020.