A good news, we can enable P2P for Dual RTX 4090 or more. When running simpleP2P
script, we will got this results. Don’t worry, next is how to enable it easily!
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
Checking GPU(s) for support of peer to peer memory access...
> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : No
> Peer access from NVIDIA GeForce RTX 4090 (GPU1) -> NVIDIA GeForce RTX 4090 (GPU0) : No
Two or more GPUs with Peer-to-Peer access capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.
First thing, make sure to enable Resize BAR and disable IOMMU in BIOS (I’m using ASUS WRX80SAGE).
Next, uninstall all nvidia drivers (*yes, thats right!)
# Uninstall all nvidia
sudo apt-get --purge remove "*nvidia*"
sudo apt-get --purge remove "*cuda*" "*cudnn*" "*cublas*" "*cufft*" "*cufile*" "*curand*" "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*" "*libnccl*"
# disable iommu
ll /sys/class/iommu/
# install dependencies
sudo apt install git cmake
# reboot
sudo reboot
Now, we will be fallback to Nouveau driver. Its okay. Here is the steps to disable or you can google it.
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo update-initramfs -u
sudo reboot
Next, download the NVIDIA Driver and install it
wget -c https://us.download.nvidia.com/XFree86/Linux-x86_64/565.57.01/NVIDIA-Linux-x86_64-565.57.01.run
# Choose the MIT/GPL, don't proprietary
sudo sh ./NVIDIA-Linux-x86_64-565.57.01.run --no-kernel-modules
Then install tiny P2P
git clone git@github.com:tinygrad/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
sudo ./install.sh
sudo depmod
sudo reboot
Next, install cuda samples. You need to install NVCC and test! I’m using Cuda 12.6 and download the release 12.5 since the latest version is not working. (my GCC version is 12 and 10)
sudo apt install gcc-10 g++-10 gcc-10 g++-10 -y
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10 --slave /usr/bin/g++ g++ /usr/bin/g++-10
# to change others
sudo update-alternatives --config gcc
Download the zip from here followed to your NVCC version https://github.com/NVIDIA/cuda-samples/releases
cd cuda-samples
mkdir build && cd build
cd ..
make
./bin/x86_64/linux/release/deviceQuery
Voilla! Now you see P2P Enabled!
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 66 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : Yes
> Peer access from NVIDIA GeForce RTX 4090 (GPU1) -> NVIDIA GeForce RTX 4090 (GPU0) : Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.7, CUDA Runtime Version = 12.2, NumDevs = 2
Result = PASS

One reply on “Install P2P Dual RTX 4090 Ubuntu 24.04”
Amazing.
This post describing specific steps really helps me!