2022-03-17 15:03:07 +00:00
|
|
|
# Setup for Tensorflow with GPU
|
2022-03-17 13:14:49 +00:00
|
|
|
|
|
|
|
Tested for ubuntu-20.04.4
|
|
|
|
|
|
|
|
Steps:
|
|
|
|
|
|
|
|
1. Prepare setup:
|
|
|
|
|
|
|
|
git clone https://repos.nonan.net/nicolas/gpu_server_setup.git
|
|
|
|
cd gpu_server_setup
|
|
|
|
|
|
|
|
|
|
|
|
2. Setup driver/CUDA:
|
|
|
|
|
|
|
|
sudo bash setup_cuda.sh
|
2022-03-21 16:04:15 +00:00
|
|
|
|
|
|
|
|
|
|
|
3. Reboot system:
|
|
|
|
|
2022-03-17 13:14:49 +00:00
|
|
|
sudo systemctl reboot
|
|
|
|
|
|
|
|
|
2022-03-21 16:04:15 +00:00
|
|
|
4. Setup bcache:
|
2022-03-17 13:14:49 +00:00
|
|
|
|
|
|
|
sudo bash setup_bcache.sh
|
|
|
|
|
|
|
|
|
2022-03-21 16:04:15 +00:00
|
|
|
5. Setup apps (Python, JupyterHub (Hub is running as root), Tensorflow etc.):
|
2022-03-17 13:14:49 +00:00
|
|
|
|
|
|
|
sudo bash setup_apps.sh
|
|
|
|
|
|
|
|
|
|
|
|
## Notes
|
|
|
|
|
|
|
|
### CUDA
|
|
|
|
|
|
|
|
Check state of NVIDIA devices (electrical power, temperature, memory etc.):
|
|
|
|
|
|
|
|
nvidia-smi
|
|
|
|
|
|
|
|
|
|
|
|
### bcache
|
|
|
|
|
|
|
|
Check bcache performance:
|
|
|
|
|
|
|
|
cat /sys/block/bcache0/bcache/state
|
|
|
|
cat /sys/block/bcache*/bcache/stats_five_minute/cache_hit_ratio
|
|
|
|
cat /sys/block/bcache*/bcache/stats_hour/cache_hit_ratio
|
|
|
|
|
|
|
|
|
|
|
|
Tune bcache (not permanent):
|
|
|
|
|
|
|
|
echo 64M > /sys/block/bcache0/bcache/sequential_cutoff
|
|
|
|
echo 4096 > /sys/block/bcache0/queue/read_ahead_kb
|