Install Numpy with OneAPI MKL for AMD in Ubuntu

NumPy uses libraries like BLAS, LAPACK, BLIS, or MKL to execute vector, matrix, and linear algebra operations. It’s acknowledged that Intel with MKL (Math Kernel Library) is quite more mature in this math operation than other libraries due to resources and experiences.

If you want to leverage Intel OneAPI MKL as backend for your Numpy, especially on Intel chip (or AMD if you want to try), here are a quick step for installation in Ubuntu (I use the latest ubuntu 23.04 Lunar Lobster).

First, download the required softwares

sudo apt install build-essential python3-pip python3 python3-dev libomp-dev

A context, libomp-dev will help you to avoid the error “Solve bmkl_intel_thread.so.2: undefined symbol: omp_get_num_procs” when importing the numpy libraries in the python interpreter.

Second, we will install the Intel Base-kit: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=linux&distributions=aptpackagemanager.

Or follow the commands below

# download the key to system keyring
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

# Execute apt update
sudo apt update

# Install Intel-base kit contain MKL
sudo apt install intel-basekit

Second, load intel libraries in ldconfig

This is to avoid the error “ImportError: libmkl_rt.so.2: cannot open shared object file: No such file or directory”. We create a new file called “mylibs.conf” in /etc/ld.so.conf.d with path of Intel MKL Libraries

vim /etc/ld.so.conf.d/mylibs.conf and put this inside the file

library_dirs = /opt/intel/oneapi/mkl/2023.1.0/lib/intel64

You need to reload the config with

sudo ldconfig

Third, we install numpy with the MKL library backend.

You can choose either with virtual environment or directly combine with existing OS Python in Ubuntu (you must pass some argument in pip installation)

We need to install numpy with configuration libraries point to our installed Intel base kit. In this case, create a new file called “.numpy-site.cfg” in home path

~/.numpy-site.cfg with content

[mkl]
library_dirs = /opt/intel/oneapi/mkl/2023.1.0/lib/intel64
include_dirs = /opt/intel/oneapi/mkl/2023.1.0/include
libraries = mkl_rt

Then, we install numpy from the sources and follow the configuration

pip install numpy --no-binary :all:

Or, you can use official Numpy recommendation (it has no effect for me)

pip install --no-use-pep517 --global-option=build --global-option="--cpu-dispatch=max" numpy --no-binary :all:

Fourth, check if the installation is a success

You can open Python interpreter and import numpy with show config to see if the MKL library is being used

# open terminal and execute python interpreter
python3

# then, import numpy libraries
import numpy

# show the configuration
numpy.show_config()

Bonus: If you have AMD chipset and want to improve the MKL libraries performance

A. Create a new file called fakeintel.c at home path (~/) and put this code and save the file

int mkl_serv_intel_cpu_true() {
  return 1;
}
$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c

B. Compile it via terminal

gcc -shared -fPIC -o libfakeintel.so fakeintel.c

C. Load the libraries with export and change the USERNAME to your own username

export LD_PRELOAD=/home/USERNAME/libfakeintel.so

Running Benchmark!

Download the numpy benchmark file here https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276

And execute it with “python numpy-benchmark.py”

python numpy-benchmark.py

#AMD without LD_PRELOAD + Intel One API 2023

Dotted two 4096x4096 matrices in 0.25 s.
Dotted two vectors of length 524288 in 0.01 ms.
SVD of a 2048x1024 matrix in 0.28 s.
Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
Eigendecomposition of a 2048x2048 matrix in 3.03 s.


#Benchmark TR 24 cores with MKL + Tweaking
Dotted two 4096x4096 matrices in 0.22 s. 
Dotted two vectors of length 524288 in 0.01 ms. 
SVD of a 2048x1024 matrix in 0.24 s. 
Cholesky decomposition of a 2048x2048 matrix in 0.09 s. 
Eigendecomposition of a 2048x2048 matrix in 2.29 s.

Running Benchmark!

Leave a Reply Cancel reply