Categories
Machine Learning

Fix VLLM LMDeploy /usr/bin/ld: cannot find -lcuda: No such file or directory

When running LMDeploy and got this error

2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:53 - CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpsne1hded/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpsne1hded/__triton_launcher.cpython-38-x86_64-linux-gnu.so', '-lcuda', '-L/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-I/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpsne1hded', '-I/home/dev/miniforge3/envs/lmdeploy/include/python3.8']' returned non-zero exit status 1.
2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:54 - <Triton> check failed!
Please ensure that your device is functioning properly with <Triton>.
You can verify your environment by running `python -m lmdeploy.pytorch.check_env.triton_custom_add`.

When running python -m lmdeploy.pytorch.check_env.triton_custom_add it will show error

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
Traceback (most recent call last):

To solve it, symbolic link

sudo ln -s /usr/local/cuda-12.2/targets/x86_64-linux/lib/stubs/libcuda.so /usr/lib64/libcuda.so

Then

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add                                         
Done.
Categories
Devops

Fix Google COS GPU Docker unable create new device

If you got this error, congratulations, you have the solution here. This is quite complicated problem as below

nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: last insn is not an exit or jmp processed 0 insns (limit 1000000)

Turns out the solution is just run this either in your metadata startup script or inside the Google Container Optimized OS VM.

sysctl -w net.core.bpf_jit_harden=1 

If you want more

bash -c "echo net.core.bpf_jit_harden=1 > /etc/sysctl.d/91-nvidia-docker.conf"
sysctl --system
systemctl restart docker
Categories
Ubuntu

Disable ZSH autocomplete expansion

To disable ZSH annoying expansion without complete path

zstyle ':completion:*' completer _complete _complete:-fuzzy _correct _approximate _ignored _expand
Categories
Ubuntu

Solve Svelte [dev:svelte] [error] No parser could be inferred for file

When running pnpm run dev on Svelte + Vite + Shacdn project, I received error

[dev:svelte] [error] No parser could be inferred for file
[dev:svelte] [error] No parser could be inferred for file
[dev:svelte] [error] No parser could be inferred for file

To solve this, create .prettierrc file and put this

Categories
Ubuntu

Fix [WARNING] Cannot find base config file “./.svelte-kit/tsconfig.json” [tsconfig.json]

When running Shacdn + Svelte, Vite, I got this error :

▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]

To solve this, edit package.json and add prepare": "svelte-kit sync",. For example

"scripts": {
		"dev": "vite dev",
		"build": "vite build",
		"build:registry": "tsx scripts/build-registry.ts",
		"br": "pnpm build:registry",
		"preview": "vite preview",
		"test": "playwright test",
		"prepare": "svelte-kit sync",
		"sync": "svelte-kit sync",
		"check": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json",
		"check:watch": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json --watch",
		"test:unit": "vitest"
	},
Categories
Ubuntu

Upgrade and Install NVIDIA Driver 565 Ubuntu 24.04

Here are a quick step to upgrade to the latest Driver (which needed for running Docker NVIDIA Nemo)

  1. Uninstall existing NVIDIA libraries
sudo apt purge "nvidia*" "libnvidia*"

2. Install the latest NVIDIA Driver

Add PPA and check the driver version as you wish to install

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update 
sudo ubuntu-drivers list

Then to install

sudo apt install nvidia-driver-565

If you got error Failed to initialize NVML: Driver/library version mismatch the solution is reboot.

If you are using NVIDIA Container Toolkit,

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Categories
Ubuntu

Remote Desktop Ubuntu 24.04

There are a quick way to do remote desktop for Ubuntu 24.04 by enable its desktop sharing and connect using Remmina from the client. Here is the steps

  1. Enable Desktop Sharing at remote device/laptop/pc

Go to “System” -> “Desktop Sharing” and toggle both Desktop Sharing and Remote Control. In login details, filling the RDP username and Password

2. Connect via Client

Open Remmina and click “+”. Choose RDP and give the credentials the Remote user and password OS (not the RDP yet). Once you connected, then filling with the Login Details in RDP. Yes, we have two users/password here and you can set it to have same value.

Categories
Ubuntu

Solve multi-GPU not detected Docker-in-Docker Google Cloud

When I’m trying to do nvidia-smi inside the docker for multiple-gpus, its gave errors. I’m using docker API python module to run it. Checking on nvidia-gpus, its showing only single device, rather multiple

ls /proc/driver/nvidia/gpus

Solution is to ensure the gpus=all or gpus=2 is initialize properly. Running the docker manually first using

docker run --name caviar --detach --gpus all -it --privileged ghcr.io/ehfd/nvidia-dind:latest

This step showing all the GPUs is loaded. Then, the culprit is at Docker API. the proper way to do it is

Categories
Ubuntu

Fix docker compose project name must not be empty Docker-in-Docker

When running docker compose up for compose.yaml I got error:

docker compose up
WARN[0000] /docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
project name must not be empty

The quick solution is

  1. Rename it into docker-compose.yaml
  2. Move it into /home/USER like /home/ubuntu in this case.

Execute docker compose up from there.

Categories
LLM

Fix VLLM ValueError: Model architectures [‘Qwen2ForCausalLM’] failed to be inspected

When running VLLM, I got error “alueError: Model architectures [‘Qwen2ForCausalLM’] failed to be inspected”

vllm serve unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit --enable-reasoning --reasoning-parser deepseek_r1 --quantization bitsa
ndbytes --load-format bitsandbytes --enable-chunked-prefill --max_model_len 6704 

The solution is put VLLM_USE_MODELSCOPE=True

For example

VLLM_USE_MODELSCOPE=True vllm serve unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit --enable-reasoning --reasoning-parser deepseek_r1 --quantization bitsa
ndbytes --load-format bitsandbytes --enable-chunked-prefill --max_model_len 6704