gpu_server_setup/README.md

61 lines
1.3 KiB
Markdown
Raw Normal View History

2022-03-17 15:03:07 +00:00
# Setup for Tensorflow with GPU
Tested for ubuntu-20.04.4
Steps:
1. Prepare setup:
git clone https://repos.nonan.net/nicolas/gpu_server_setup.git
cd gpu_server_setup
2. Setup driver/CUDA:
sudo bash setup_cuda.sh
2022-03-21 16:04:15 +00:00
3. Reboot system:
sudo systemctl reboot
2022-03-21 16:04:15 +00:00
4. Setup bcache:
sudo bash setup_bcache.sh
2022-03-21 16:04:15 +00:00
5. Setup apps (Python, JupyterHub (Hub is running as root), Tensorflow etc.):
sudo bash setup_apps.sh
## Notes
### CUDA
Check state of NVIDIA devices (electrical power, temperature, memory etc.):
nvidia-smi
### bcache
Check bcache performance:
cat /sys/block/bcache0/bcache/state
cat /sys/block/bcache*/bcache/stats_five_minute/cache_hit_ratio
cat /sys/block/bcache*/bcache/stats_hour/cache_hit_ratio
Tune bcache (not permanent):
echo 64M > /sys/block/bcache0/bcache/sequential_cutoff
2022-03-21 17:09:57 +00:00
echo 4096 > /sys/block/bcache0/queue/read_ahead_kb
### Fan-temperature control for GPUs
- [NVIDIA GPU-based FAN controller for SUPERMICRO server](https://github.com/skokec/superfans-gpu-controller)
2022-06-18 13:51:51 +00:00
- [Modification for combined GPU/CPU temperature control in 1U server](https://repos.nonan.net/nicolas/superfans-gpu-controller)
## For a multiuser setup
- [systemdspawner](https://github.com/jupyterhub/systemdspawner) alow for mem_limit