The quick way to make the model inferences or fine-tuning running on specific NVIDIA GPU card is by define this variable before execute the script
For instance, I forced its running on GPU 1 by
export CUDA_VISIBLE_DEVICES=1
The quick way to make the model inferences or fine-tuning running on specific NVIDIA GPU card is by define this variable before execute the script
For instance, I forced its running on GPU 1 by
export CUDA_VISIBLE_DEVICES=1
Passage retrieval methods refer to techniques and algorithms used to retrieve relevant passages or segments of text from a larger document or corpus. These methods are commonly employed in information retrieval systems and question-answering systems, where the goal is to locate specific information within a large amount of text.
The quickfix on how to uninstall current Cuda installed in Ubuntu not via software packages is using the uninstaller. For instance, I use cuda 11.8 and I need to downgrade it into 11.6.
So, I need to find the path and trigger this command
sudo /usr/local/cuda-11.8/bin/cuda-uninstaller
Last, we can clean-up entire cuda folder
sudo rm -rf /usr/local/cuda
If you have issue with GCC for the installed CUDA and need to downgrade or upgrade it, you can follow this
MAX_GCC_VERSION=11
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo ln -s /usr/bin/gcc-$MAX_GCC_VERSION /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-$MAX_GCC_VERSION /usr/local/cuda/bin/g++
When generating question answers from datasets using this project “https://github.com/dmis-lab/LIQUID” I got error
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I believe this error because my GPU is RTX 4090 and Ada Lovelace is not supported for torch 1.12. To solve this, I upgrade the torch to one next version 1.13
pip install torch==1.13.0
The warning showed
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
And its works!
When doing finetuning model using Lora and HuggingFace transformer, I received this error
RuntimeError: unscale_() has already been called on this optimizer since the last update().
This error because using the latest transformer version transformers-4.31.0.dev0. The solution is to revert back to transformers-4.30.2 with
pip install transformers-4.30.2
When installing Python module like AutoGPTQ, you may got this errors
/usr/local/cuda/include/crt/host_config.h:132:2: error: #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
132 | #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
| ^~~~~
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
[end of output]
To solve this, we need to install GCC as following the maximum version
MAX_GCC_VERSION=11
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo ln -s /usr/bin/gcc-11 /usr/local/cuda/bin/
RTX 4090 at Full Load under Machine Learning training can produce high-temperature heat. Its can go 80-85 degree celsius. Using big industrial fans to cooling the GPU and open the PC case can reduce to 70 Celcius.
However, before going to that path, you can adjust your NVIDIA GPU fans speed from 30% to 90% or even 100%. Here are the steps to do in in Ubuntu
First, you need to configure the X11
sudo vim /etc/X11/xorg.conf
Add add Option "Coolbits" "4"
in the Section Device Nvidia
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA"
Option "Coolbits" "4"
EndSection
Reboot your PC to apply the new changes
The second steps, its to adjust its fans speed. I’m usually using Psensor to detect the fan speed. RTX 4090 have two fans, so you need to tuning both of them
When running –quantize llm.int8 in adapter for LitLLama, I got this error
ImportError: cannot import name 'Linear8bitLt' from 'lit_llama.quantization'
First step, we need to make sure if bitsandbytes
is running well by
python -m bitsandbytes
And I received
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
...
packages/bitsandbytes/functional.py", line 12, in <module>
from scipy.stats import norm
ModuleNotFoundError: No module named 'scipy'
Now, I know the problem is scipy is not installed. To solve this is installing scipy
pip install scipy
And I re-run again for bitsandbytes
I’m using Ubuntu 23.04 Cinnamon the latest in 2023 and after moving files, suddenly the Nautilus explorer show error
Couldn't open file. No program to open the file
And suddenly the file, folder and all directories inside is gone.
If you have this problem, don’t panic. To solve this problem: Please reboot the ubuntu. Once you get login, you can go check on Trash and the file will be there!
To install transformers, Pytorch and Tensorflow works with GPU for the latest Ubuntu, several steps are required. This is how I successfully setup it and running several models with it.
Please make sure to install the latest NVIDIA drivers. I use RTX 4090 in this case. This is the link https://www.nvidia.com/download/driverResults.aspx/200481/en-us/
If you are using nouveau, you can disable it via
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo update-initramfs -u
sudo reboot