Categories
Ubuntu

Fix VLLM RuntimeError: NCCL error: unhandled system error

Running VLLM with tensor-parallel-size more than 1, triggered this error:

RuntimeError: NCCL error: unhandled 
system error (run with NCCL_DEBUG=INFO for details)
Exception: WorkerProc initialization failed due to a
n exception in a background process. See stack trace for root cause.
(EngineCore_0 pid=236) Process EngineCore_0:

This error is not about NCCL_P2P_DISABLE=1, but this vague error because when tensor-parallel-size using multiple GPUs, its need memory for sharing each other.

So, the solution is to add --shm-size 10g. Remove all the environment variable passed to docker to investigate. Be careful, environment variable that caused VLLM error may cancellout the other env.

Here is some example that works

docker run --rm -it \
  --gpus all \
  --network host -p 8000:8000 -p 8080:8080 --shm-size 10g \
  -e NCCL_P2P_DISABLE=1 \
  -v /model/llama-3-2-1b:/model \
  nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.1 \
  python3 -m vllm.entrypoints.openai.api_server \
    --model /model \
    --tensor-parallel-size 2 \
    --served-model-name model \
    --dtype bfloat16 \
    --gpu-memory-utilization 0.90 \
    --max-model-len 8192
Categories
Ubuntu

Run GPT OSS 20B on VLLM with RTX 4090

Here is a quick way to run OpenAI GPT OSS 20B in RTX 4090 GPU

docker run --name vllm --gpus all -v /YOUR_PATH_TO_MODEL/models--gpt-oss-20b:/model -e VLLM_ATTENTION_BACKEND='TRITON_ATTN_VLLM_V1' \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:gptoss \
    --model /model --served-model-name model

You can download the model with

hf download openai/gpt-oss-20b --local-dir ./
Categories
Ubuntu

Set fan speed Nvidia GPU Ubuntu Server Headless

Here are the quick command to adjust NVIDIA GPU on headless ubuntu.

Run this and reboot

sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7

Run Display then execute your NVIDIA-settings fan speed

X :1 &
export DISPLAY=:1

Or, simple way is copy this as fan.sh at home path then set permission with chmod a+x ~/fan.sh.

The usage `~/fan.sh 50 50`, which will adjust fan speed for 2x RTX 4090

❯ cat fan.sh       
#!/bin/bash

# Check if an argument is provided
if [ -z "$1" ] || [ -z "$2" ]; then
    echo "Usage: $0 <fan_speed_gpu0> <fan_speed_gpu1>"
    echo "Please provide fan speed percentages (0-100)."
    exit 1
fi

# Validate input (must be a number between 0 and 100)
if ! [[ "$1" =~ ^[0-9]+$ ]] || [ "$1" -lt 0 ] || [ "$1" -gt 100 ]; then
    echo "Error: Fan speed for GPU0 must be an integer between 0 and 100."
    exit 1
fi

if ! [[ "$2" =~ ^[0-9]+$ ]] || [ "$2" -lt 0 ] || [ "$2" -gt 100 ]; then
    echo "Error: Fan speed for GPU1 must be an integer between 0 and 100."
    exit 1
fi

FAN_SPEED=$1
FAN_SPEED_TWO=$2

# Ensure X server is running
if ! pgrep -x "Xorg" > /dev/null && ! pgrep -x "X" > /dev/null; then
    echo "X server not running, starting a new one..."
    export XDG_SESSION_TYPE=x11
    export DISPLAY=:0
    startx -- $DISPLAY &
    sleep 5
else
    echo "X server is already running."
    export DISPLAY=:0
fi

# Set fan control state and speed for GPU 0
echo "Setting fan speed to $FAN_SPEED% for GPU 0..."
nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=$FAN_SPEED"
nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=$FAN_SPEED"

# Set fan control state and speed for GPU 1
echo "Setting fan speed to $FAN_SPEED_TWO% for GPU 1..."
nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=$FAN_SPEED_TWO"
nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=$FAN_SPEED_TWO"

echo "Fan speed set to $FAN_SPEED% (GPU 0) and $FAN_SPEED_TWO% (GPU 1)."
Categories
Machine Learning

Fix VLLM LMDeploy /usr/bin/ld: cannot find -lcuda: No such file or directory

When running LMDeploy and got this error

2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:53 - CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpsne1hded/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpsne1hded/__triton_launcher.cpython-38-x86_64-linux-gnu.so', '-lcuda', '-L/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-I/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpsne1hded', '-I/home/dev/miniforge3/envs/lmdeploy/include/python3.8']' returned non-zero exit status 1.
2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:54 - <Triton> check failed!
Please ensure that your device is functioning properly with <Triton>.
You can verify your environment by running `python -m lmdeploy.pytorch.check_env.triton_custom_add`.

When running python -m lmdeploy.pytorch.check_env.triton_custom_add it will show error

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
Traceback (most recent call last):

To solve it, symbolic link

sudo ln -s /usr/local/cuda-12.2/targets/x86_64-linux/lib/stubs/libcuda.so /usr/lib64/libcuda.so

Then

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add                                         
Done.
Categories
Devops

Fix Google COS GPU Docker unable create new device

If you got this error, congratulations, you have the solution here. This is quite complicated problem as below

nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: last insn is not an exit or jmp processed 0 insns (limit 1000000)

Turns out the solution is just run this either in your metadata startup script or inside the Google Container Optimized OS VM.

sysctl -w net.core.bpf_jit_harden=1 

If you want more

bash -c "echo net.core.bpf_jit_harden=1 > /etc/sysctl.d/91-nvidia-docker.conf"
sysctl --system
systemctl restart docker
Categories
Ubuntu

Disable ZSH autocomplete expansion

To disable ZSH annoying expansion without complete path

zstyle ':completion:*' completer _complete _complete:-fuzzy _correct _approximate _ignored _expand
Categories
Ubuntu

Solve Svelte [dev:svelte] [error] No parser could be inferred for file

When running pnpm run dev on Svelte + Vite + Shacdn project, I received error

[dev:svelte] [error] No parser could be inferred for file
[dev:svelte] [error] No parser could be inferred for file
[dev:svelte] [error] No parser could be inferred for file

To solve this, create .prettierrc file and put this

Categories
Ubuntu

Fix [WARNING] Cannot find base config file “./.svelte-kit/tsconfig.json” [tsconfig.json]

When running Shacdn + Svelte, Vite, I got this error :

▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]

To solve this, edit package.json and add prepare": "svelte-kit sync",. For example

"scripts": {
		"dev": "vite dev",
		"build": "vite build",
		"build:registry": "tsx scripts/build-registry.ts",
		"br": "pnpm build:registry",
		"preview": "vite preview",
		"test": "playwright test",
		"prepare": "svelte-kit sync",
		"sync": "svelte-kit sync",
		"check": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json",
		"check:watch": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json --watch",
		"test:unit": "vitest"
	},
Categories
Ubuntu

Upgrade and Install NVIDIA Driver 565 Ubuntu 24.04

Here are a quick step to upgrade to the latest Driver (which needed for running Docker NVIDIA Nemo)

  1. Uninstall existing NVIDIA libraries
sudo apt purge "nvidia*" "libnvidia*"

2. Install the latest NVIDIA Driver

Add PPA and check the driver version as you wish to install

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update 
sudo ubuntu-drivers list

Then to install

sudo apt install nvidia-driver-565

If you got error Failed to initialize NVML: Driver/library version mismatch the solution is reboot.

If you are using NVIDIA Container Toolkit,

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Categories
Ubuntu

Remote Desktop Ubuntu 24.04

There are a quick way to do remote desktop for Ubuntu 24.04 by enable its desktop sharing and connect using Remmina from the client. Here is the steps

  1. Enable Desktop Sharing at remote device/laptop/pc

Go to “System” -> “Desktop Sharing” and toggle both Desktop Sharing and Remote Control. In login details, filling the RDP username and Password

2. Connect via Client

Open Remmina and click “+”. Choose RDP and give the credentials the Remote user and password OS (not the RDP yet). Once you connected, then filling with the Login Details in RDP. Yes, we have two users/password here and you can set it to have same value.