ubuntu 踩坑记录
显卡驱动重装
某次装好后,遇到bug:
Can’t run remote python interpreter: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown
docker 里nvidia-smi不能用了,直接在docker外nvidia-smi也报错:
NVIDIA-SMI couldn’t find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.
估计是什么时候update弄成的。
解决方法:重装显卡驱动
1 | # BTW this is all in console mode (for me, alt+ctrl+F2) |
然后再重装NVIDIA-docker:
1 | $curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - |
测试:
1 | sudo nvidia-docker run --rm nvidia/cuda:10.1-devel nvidia-smi |
万幸CUDA, CuDNN都还有。
1 | import torch |
配置默认运行的是nvidia-docker 而不是 docker (https://zhuanlan.zhihu.com/p/37519492),在/etc/docker/daemon.json 文件中配置如下内容:
1 | { |
pycharm里用docker
python 位置:/home/shiyuuuu/anaconda3/bin/python