When running VLLM, I got error “alueError: Model architectures [‘Qwen2ForCausalLM’] failed to be inspected”
vllm serve unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit --enable-reasoning --reasoning-parser deepseek_r1 --quantization bitsa
ndbytes --load-format bitsandbytes --enable-chunked-prefill --max_model_len 6704
The solution is put VLLM_USE_MODELSCOPE=True
For example
VLLM_USE_MODELSCOPE=True vllm serve unsloth/DeepSeek-R1-Distill-Qwen-32B-bnb-4bit --enable-reasoning --reasoning-parser deepseek_r1 --quantization bitsa
ndbytes --load-format bitsandbytes --enable-chunked-prefill --max_model_len 6704