RTX 4090 at Full Load under Machine Learning training can produce high-temperature heat. Its can go 80-85 degree celsius. Using big industrial fans to cooling the GPU and open the PC case can reduce to 70 Celcius.
However, before going to that path, you can adjust your NVIDIA GPU fans speed from 30% to 90% or even 100%. Here are the steps to do in in Ubuntu
First, you need to configure the X11
sudo vim /etc/X11/xorg.conf
Add add Option "Coolbits" "4"
in the Section Device Nvidia
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA"
Option "Coolbits" "4"
EndSection
Reboot your PC to apply the new changes
The second steps, its to adjust its fans speed. I’m usually using Psensor to detect the fan speed. RTX 4090 have two fans, so you need to tuning both of them
Adjust fan 0
sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=80"
Adjust fan 1
sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=80"
Now you should see the fans speed is increasing and this should solve your heat problem!
In my case, combined with open-case and industrial fans, I’m able to make RTX 4090 operate with good temperature 63 celsius under 100% utilization of fine-tuning LLM
Adjust at 50% to make your RTX steady within 39 celsius degree
sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=0" -a "[fan:1]/GPUTargetFanSpeed=50" && sudo nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=50"
If you have dual RTX 4090
#!bin/sh
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=60"
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=60"
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:2]/GPUTargetFanSpeed=60"
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:3]/GPUTargetFanSpeed=60"
References:
Credit to Justin ho