sh 在GCP上运行Docker上的tensorflow GPU
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sh 在GCP上运行Docker上的tensorflow GPU相关的知识,希望对你有一定的参考价值。
#!/bin/bash
sudo apt-get update
sudo apt-get install -y wget
sudo apt-get install -y linux-headers-$(uname -r)
sudo apt-get install -y gcc
sudo apt-get install -y make
sudo apt-get install -y g++
## Install Cuda
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
chmod a+x cuda_8.0.61_375.26_linux-run
sudo ./cuda_8.0.61_375.26_linux-run --override --silent --driver --toolkit --toolkitpath=/usr/local/cuda-8.0
export PATH=$PATH:/usr/local/cuda/bin
## Install Docker
echo "\nChecking docker ..."
if ! [ -x "$(command -v docker)" ]; then
echo "docker is not installed."
sudo apt-get -y install apt-transport-https ca-certificates curl
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get -y install docker-ce
echo "\nInstalled docker."
else
echo "docker is already installed."
fi
## Install Nvidia Docker
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
## Verify docker setup
sudo nvidia-docker run --rm nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04 nvidia-smi
#sudo nvidia-docker exec -it nvidia/cuda nvidia-smi
#nvidia-smi
# Observing Performance on the GPU; gtop similar to top; http://www.stat.berkeley.edu/scf/paciorek-gpuWorkshop.html
echo "" >> ~/.bashrc
echo "alias gtop=\"nvidia-smi -q -g 0 -d UTILIZATION -l 1\"" >> ~/.bashrc
source ~/.bashrc
# Simple Docker image for tensorflow
mkdir notebooks # to persist notebooks on the host
sudo nvidia-docker run -itd --rm -v $HOME/notebooks:/notebooks -p 8888:8888 -p 6006:6006 --name tf tensorflow/tensorflow:latest-gpu
#sudo nvidia-docker run -itd --rm -v $HOME/notebooks:/notebooks -p 8888:8888 -p 6006:6006 --name tf gcr.io/tensorflow/tensorflow:latest-gpu
# In case nvidia-docker commad does not work, use below set of commands - https://medium.com/jim-fleming/running-tensorflow-on-kubernetes-ca00d0e67539
#export DRIVER='-v /var/lib/nvidia-docker/volumes/nvidia_driver/375.26:/usr/local/nvidia'
#export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | xargs -I{} echo '--volume {}:{}')
#export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
#sudo docker run -itd --rm $DRIVER $CUDA_SO $DEVICES gcr.io/tensorflow/tensorflow:latest-gpu python -c 'import tensorflow'
#sudo docker run -itd --rm $DRIVER $CUDA_SO $DEVICES -v $HOME/notebooks:/notebooks -p 8888:8888 -p 6006:6006 --name tf gcr.io/tensorflow/tensorflow:latest-gpu
#sudo docker run -itd --rm $DRIVER $CUDA_SO $DEVICES -v $HOME/notebooks:/notebooks -p 8888:8888 -p 6006:6006 --name tf5 gcr.io/tensorflow/tensorflow:latest-gpu
#!/bin/bash
gcloud compute --project "vikas-tensorflow" ssh --zone "us-east1-d" "gpu-docker-host" -- -L 8888:localhost:8888 -L 6006:localhost:6006
#!/bin/bash
gcloud beta compute instances create pipeline-gpu --zone us-east1-d --machine-type n1-highmem-8 \
--boot-disk-size=100GB --boot-disk-auto-delete --boot-disk-type=pd-ssd --accelerator type=nvidia-tesla-k80,count=1 \
--image-family ubuntu-1604-lts --image-project ubuntu-os-cloud --maintenance-policy TERMINATE --restart-on-failure \
--metadata-from-file startup-script=startup_nvidia_docker.sh
#!/bin/bash
gcloud compute instances list
gcloud compute instances stop | delete | start <instance_name>
以上是关于sh 在GCP上运行Docker上的tensorflow GPU的主要内容,如果未能解决你的问题,请参考以下文章
centos 上docker 运行出现/bin/sh: . not found
GCP 计算实例上的部署失败,为啥从 Gitlab 推送更改
请求的身份验证范围不足 - GCP 上的 Dataflow/Apache Beam
如何在 GCP 中设置与 MongoDB Docker 容器的连接
GCP 上的 PyTorch Lightning 多节点训练错误
我们的 GCP docker 实例已启动并正在运行,但是我们无法连接到 GreenPlum