Solve Punica Installation and output must be a CUDA tensor error

Punica is very interesting project that showing running multiple LORAs model in single GPU. There are few things need to be done to make this project works in your local and avoiding issue like

_kernels.rms_norm(o, x, w, eps) RuntimeError: output must be a CUDA tensor
/torch/utils/cpp_extension.py”, line 2120, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
error: subprocess-exited-with-error
rich modules not installed and so on

Here are the steps

Change NVCC version, I’m downgrade it into CUDA 12.1.
Install G++ and GCC (version 10)

MAX_GCC_VERSION=10
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-$MAX_GCC_VERSION $MAX_GCC_VERSION

sudo apt install g++

3. Install the right torch version based on your CUDA version

pip install torch==2.5.1+cu121 --index-url https://download.pytorch.org/whl/cu121

4. Build from source!

pip install ninja numpy torch

# Clone punica
git clone https://github.com/punica-ai/punica.git
cd punica
git submodule sync
git submodule update --init

# If you encouter problem while compilation, set TORCH_CUDA_ARCH_LIST to your CUDA architecture.
# I'm using RTX4090, so ADA is 8.9. Check your version
export TORCH_CUDA_ARCH_LIST="8.9" 

# Build and install punica
pip install -v --no-build-isolation .

Why build from source works? Because its required to compile a new CUDA kernel design SGMV

Leave a Reply Cancel reply