|
||
---|---|---|
README.md | ||
id_rsa_nicolas.pub | ||
jupyterhub.service | ||
jupyterhub_conda.service | ||
jupyterhub_config.py | ||
jupyterhub_config_https.py | ||
setup_apps.sh | ||
setup_apps_conda.sh | ||
setup_bcache.sh | ||
setup_cuda.sh | ||
setup_suspend.sh | ||
setup_tensorflow_conda.sh |
README.md
Setup for Tensorflow with GPU
Tested for ubuntu-20.04.4
Steps:
-
Prepare setup:
git clone https://repos.nonan.net/nicolas/gpu_server_setup.git cd gpu_server_setup
-
Setup driver/CUDA:
sudo bash setup_cuda.sh
-
Reboot system:
sudo systemctl reboot
-
Setup bcache:
sudo bash setup_bcache.sh
-
Setup apps (Python, JupyterHub (Hub is running as root), Tensorflow etc.):
sudo bash setup_apps.sh
Notes
CUDA
Check state of NVIDIA devices (electrical power, temperature, memory etc.):
nvidia-smi
bcache
Check bcache performance:
cat /sys/block/bcache0/bcache/state
cat /sys/block/bcache*/bcache/stats_five_minute/cache_hit_ratio
cat /sys/block/bcache*/bcache/stats_hour/cache_hit_ratio
Tune bcache (not permanent):
echo 64M > /sys/block/bcache0/bcache/sequential_cutoff
echo 4096 > /sys/block/bcache0/queue/read_ahead_kb
Fan-temperature control for GPUs
- NVIDIA GPU-based FAN controller for SUPERMICRO server
- Modification for combined GPU/CPU temperature control in 1U server
For a multiuser setup
- systemdspawner alow for mem_limit