Go to file
Domen Tabernik 9cb3bfa286 Added main to superfans.py that displays RPMs of all FANs in 4 zones. 2019-12-10 17:58:05 +01:00
LICENCE Initial commit (working version). 2019-03-08 19:24:59 +00:00
README.md Updated readme based on latest changes to the code 2019-12-10 17:15:55 +01:00
install_daemon.sh Added systemd service dependency to nvidia-persistant.service to ensure nvidia-smi can be accessed quickly. 2019-12-10 15:51:48 +01:00
superfans.py Added main to superfans.py that displays RPMs of all FANs in 4 zones. 2019-12-10 17:58:05 +01:00
superfans_gpu_controller.py Several changes and fixes: 2019-12-10 17:09:49 +01:00

README.md

NVIDIA GPU-based FAN controller for SUPERMICRO server

This controller enables automatic adjustments of FANs in SUPERMICRO servers based on GPU temperature. Only NVIDIA GPUs are supported since the tool uses nvidia-smi to parse the GPU temperature. FANs are controlled through IPMI tool (impitool) using the modified superfans (https://github.com/putnam/superfans) script.

Requirements

  • Linux (tested on Ubuntu 16.04.5)
  • Python 2.7
  • nvidia drivers/tools (nvidia-smi)
  • IPMI tool (impitool) with loaded module (modprobe ipmi_devintf)

Tested on SUPERMICRO 4029GP TRT2 with RTX 2080 Ti (nvidia 415.27 drivers).

NOTE: Using this script on other systems requires manual calibration of a function that converts the FANs RPM values into %-based values (function SUPERMICRO_4029GP_TRT2_RPM_to_percent() in supermicro.py). Current values are hardcoded for SUPERMICRO 4029GP TRT2.

Usage

Directly call python script (requires sudo access for impitool):

sudo python superfans_gpu_controller.py

Or install systemd service (superfans-gpu-controller.service):

sudo chmod +x ./install_daemon.sh
sudo ./install_daemon.sh

Service is registered to start at system startup. Start and stop it using:

# start
sudo systemctl start superfans-gpu-controller

# stop
sudo systemctl stop superfans-gpu-controller

# check the status
sudo systemctl status superfans-gpu-controller

# view logs (with trailing)
sudo journalctl -f -u superfans-gpu-controller

Settings

Currently the settings are hardcoded into superfans_gpu_controller.py using the following table:

  • 0°C => 20%
  • 60°C => 25%
  • 70°C => 30%
  • 80°C => 35%
  • 87°C => 40%
  • 90°C => 43%

At full workload using 4x RTX 2080 Ti this results in around 75°C - 80°C at GPU.

TODO:

  • split settings into seperate config file
  • enable linear increases between each setting point
  • enable online calibration of FANs (currenlty hardcoded for SUPERMICRO 4029GP TRT2!!)
  • enable robust detection of faulty FANs as outliers (using RANSAC-like algorithm)