首先安装docker
apt-get install docker.io
1
安装docker-nvidia-smi
添加源
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
1
2
3
4
5
2
3
4
5
更新源
sudo apt-get update
1
安装nvidia-docker
sudo apt-get install -y nvidia-docker2
1
启动服务nvidia-docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
1
2
2
测试
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
1
启动工作docker
启动docker 带gpu
mrdabin/airender:torchcu113.xformers
nvidia-docker run -d \
--network=host \
--ipc=host \
--restart=always \
-v $PWD/workspace:/workspace \
-v $PWD/jupyter.sh:/jupyter.sh \
--gpus all \
--shm-size 20G \
--name notebook \
--workdir=/workspace \
-it mrdabin/airender:torchcu113.xformers bash /jupyter.sh
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
nvidia-docker run -d \
--privileged=true \
--network=host \
--ipc=host \
--restart=always \
-v $PWD/stable-diffusion-render:/workspace \
-v $PWD/jupyter.sh:/jupyter.sh \
--gpus all \
--shm-size 32G \
--name notebook2 \
--workdir=/workspace \
-it ufoym/deepo bash /jupyter.sh
1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
参数说明
-ipc=host:可以让容器与主机共享内存
--shm-size 16G :默认分配很小的内参,在训练模型时不够用,可以通过参数设置
--gpus all:默认是不把GPU加入到docker环境中的,但可以通过参数设置
- ufoym/deepo:是你安装的jupyter镜像的名
nvidia-docker:否则在镜像中无法使用gpu
1
2
3
4
5
2
3
4
5
参考
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
https://github.com/NVIDIA/nvidia-docker
https://www.jianshu.com/p/1295a0f6423d