软件安装

linux # centos # 安装cuda

2018-09-04  本文已影响827人  FlyingPenguin

注意: 千万不要在虚拟机机中操作,不会成功的。因为目前不支持

要想成功,需要在实体机中操作

准备

确认版本

主要确认CUDA toolkit和nvidia的驱动版本。
经过实践之后,发现最靠谱的确定思路是:
首先根据本机的显卡版本,确定nvidia显卡的驱动版本,然后根据驱动版本确定CUDA toolkit的版本

可以看到显卡的类型为GeForce GTX 1060 3G

CUDA的core个数为: 1152个

可看到当前nvidia显卡最新的驱动版本为: 390.87

linux平台下,由于nvidia driver的最新版本为390.87,所以无法选择CUDA 9.2, 因为它对driver的要求是>=396.26, 所以选择CUDA 9.1,它的要求是>=390.46, 满足要求

可见CUDA 9.1对各系统的要求。
比如CentOS 7.x,要求内核3.10, gcc版本4.8.5, GLIBC版本2.17等。

必要的查询

可参考https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html中的第2章。

lspci | grep -i nvidia

可以在https://developer.nvidia.com/cuda-gpus查询本机的显卡是否支持CUDA。

$ uname -m && cat /etc/*release

You should see output similar to the following, modified for your particular system:

x86_64
Red Hat Enterprise Linux Workstation release 6.0 (Santiago)

The x86_64 line indicates you are running on a 64-bit system.
The remainder gives information about your distribution.

$ gcc --version
ll /lib64/libc.so.*
sudo yum install "kernel-devel-uname-r == $(uname -r)"

安装显卡驱动和CUDA toolkit

Handle Conflicting Installation Methods中提到:

可见,同版本的显卡驱动和CUDA toolkit,如果再次安装时,需要卸载旧的版本

如果CUDA toolkit已安装,可用如下途径卸载:

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

安装显卡driver

安装官方显卡驱动,可参考这个网址:https://blog.csdn.net/u013378306/article/details/69229919
里边介绍了一种简单的用yum安装nvidia显卡驱动的方法。
操作之前需要屏蔽默认带有的nouveau。

lsmod | grep nouveau

如果以上语句没有输出,则表示屏蔽默认带有的nouveau
成功。

这种方式,最后一步:

yum -y install kmod-nvidia

有时可能不成功,不过不妨碍使用

nvidia-detect -v

返回的结果去查找对应的驱动版本,进行安装。

显卡安装成功后,可用如下命令查看显卡信息:

nvidia-smi

出现以上信息,说明显卡驱动安装成功

卸载显卡驱动,可用如下指令:

nvidia-uninstall
安装 CUDA toolkit

注: 安装前应该关闭gnome

获取CUDA toolkit下载地址:

CUDA toolkit 下载地址: https://developer.nvidia.com/cuda-toolkit-archive
下载CUDA 9.1。

安装CUDA:
sh cuda_9.1.85_387.26_linux.run 

安装过程(以下是某次安装9.2版本的日志,仅参考):

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: yes

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: yes

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y

Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.2 ]: y

Toolkit location must be an absolute path.
Enter Toolkit Location
 [ default is /usr/local/cuda-9.2 ]: /usr/local/cuda-9.2

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /root ]: y

Samples location must be an absolute path
Enter CUDA Samples Location
 [ default is /root ]: y

Samples location must be an absolute path
Enter CUDA Samples Location
 [ default is /root ]: /root

Installing the NVIDIA display driver...

安装成功的日志:

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.2
Samples:  Installed in /root

Please make sure that
 -   PATH includes /usr/local/cuda-9.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_3101.log
配置环境变量

https://www.jianshu.com/p/73399a4c9114 参考这个设置环境变量。

验证cuda是否安装成功
cd /root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery
make
./deviceQuery

如果成功,会显示PASS

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 3GB"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 3013 MBytes (3159293952 bytes)
  ( 9) Multiprocessors, (128) CUDA Cores/MP:     1152 CUDA Cores
  GPU Max Clock rate:                            1747 MHz (1.75 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 3GB
Result = PASS

可以看到CUDA Driver Version / Runtime Version 8.0 / 8.0
( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores
等参数。

如何查看cuda的版本
nvcc --version

遇到问题及解决:

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag.

解决方法:

sudo yum install epel-release
yum install --enablerepo=epel dkms
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.2
Samples:  Installed in /root, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-9.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_7498.log

解决方法:

yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64 
mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64 
mesa-libGLw-devel.x86_64 libXi-devel.x86_64 
freeglut-devel.x86_64 freeglut.x86_64
[root@localhost deviceQuery]# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
[root@localhost deviceQuery]# pwd
/root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery

这种一般是nvidia显卡驱动的问题,需要安装最新的nvidia的驱动。

http://elrepo.org/tiki/tiki-index.php

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

然后按照https://blog.csdn.net/u013378306/article/details/69229919中用yum方式安装nvidia的驱动。

这种一般是cuda版本的问题。确定正确的版本,安装即可。

CUDA driver version is insufficient for CUDA runtime version就是说cuda runtime库的版本比driver的版本高了,要么装更高版本的驱动,要么就用低一点版本的cuda runtime库,所有的库都可以在这里面找到http://developer.download.nvidia.com/compute/cuda/repos/

The solution is likely to be found at this question the short version being, run

sudo yum install "kernel-devel-uname-r == $(uname -r)"

That will install the kernel headers for the version of the kernel you are currently running.

References:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
https://baiweiblog.wordpress.com/2017/07/21/cuda-8-0%E5%9C%A8linux%E4%B8%8A%E7%9A%84%E5%AE%89%E8%A3%85%E6%B5%81%E7%A8%8B/
https://stackoverflow.com/questions/38016466/installing-cuda-7-5-on-centos-7-unable-to-locate-the-kernel-source
https://bitsanddragons.wordpress.com/2016/10/07/cuda-on-centos-7/
https://devtalk.nvidia.com/default/topic/1027413/cuda-setup-and-installation/linux-installation-error-cudagetdevicecount-returned-30-gt-unknown-error/
https://developer.download.nvidia.com/compute/cuda/9.2/Prod2/docs/sidebar/CUDA_Installation_Guide_Linux.pdf
https://blog.csdn.net/10km/article/details/61665578
https://medium.com/@changrongko/nv-how-to-check-cuda-and-cudnn-version-e05aa21daf6c
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
https://www.cnblogs.com/wolflzc/p/9117291.html
http://detail.zol.com.cn/picture_index_1760/index17594460.shtml

上一篇 下一篇

猜你喜欢

热点阅读