ubuntu 踩坑记录

Shiyu

ubuntu/bug

发布于：Mar 26, 2021

ubuntu 踩坑记录

显卡驱动重装

某次装好后，遇到bug：

Can’t run remote python interpreter: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown

docker 里nvidia-smi不能用了，直接在docker外nvidia-smi也报错：

NVIDIA-SMI couldn’t find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.

估计是什么时候update弄成的。

解决方法：重装显卡驱动

# BTW this is all in console mode (for me, alt+ctrl+F2)
# login + password as usual

# removing ALL nvidia software
$ sudo apt-get purge nvidia* 

# Checking what's left:
$ dpkg -l | grep nvidia
# Then I deleted the ones that showed up (mostly libnvidia-* but also xserver-xorg-video-nvidia-xxx`)
$ sudo apt-get purge libnvidia* xserver-xorg-video-nvidia-440 
$ sudo apt autoremove # clean it up

# now reinstall everything including nvidia-common
$ sudo apt-get nvidia-common

# find the right driver again
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ ubuntu-drivers devices
$ sudo apt-get install nvidia-driver-460 # the recommended one by ubuntu-drivers
$ update-initramfs -u # needed to do this so rebooting wouldn't lose configuration I think

$ sudo reboot

然后再重装NVIDIA-docker：

$curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$sudo apt-get update

$sudo apt-get install nvidia-docker2
$sudo pkill -SIGHUP dockerd
$docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

测试：

1	sudo nvidia-docker run --rm nvidia/cuda:10.1-devel nvidia-smi

万幸CUDA, CuDNN都还有。

>>> import torch
>>> torch.cuda.is_available()
True
>>> a=torch.randn(1,2)
>>> a.cuda()
tensor([[-0.4678,  0.1525]], device='cuda:0')

配置默认运行的是nvidia-docker 而不是 docker (https://zhuanlan.zhihu.com/p/37519492)，在/etc/docker/daemon.json 文件中配置如下内容：

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": [],
            "registry-mirrors": ["https://gemfield.mirror.aliyuncs.com"]
        }
    }
}

pycharm里用docker

python 位置：/home/shiyuuuu/anaconda3/bin/python

阅读论文-Zero-Reference deep curve estimation for low-light image enhancement

出处：CVPR2020 paper PDF supplemental materials project: https://li-chongyi.github.io/Proj_Zero-DCE....

1-两数之和

1. 两数之和https://leetcode-cn.com/problems/two-sum 题目给定一个整数数组 nums 和一个整数目标值 target，请你在该数组中找出和为目标值的...