1 .正在进行
gdb python
r XX.py
where
调试期间报告了以下错误:
1 )每次运行38个线程,并且线程是否过载[ new thread0x7ffff2FD 2700 (lwp 7415 ) ]
[ new thread0x7ffff 27 d 1700 (lwp 7416 ) ]
[ new thread0x7fffe ffd 0700 (lwp 7417 ) ]
[ new thread0x7fffeb7cf 700 (lwp 7418 ) ]
[ new thread0x7fff E8 FCE 700 (lwp 7419 ) ]
[ new thread0x7fffe 67 CD 700 (lwp 7420 ) ]
[ new thread0x7fff E3 FCC 700 (lwp 7421 ) ]
[ new thread0x7fffe 17c b 700 (lwp 7422 ) ]
[ new thread0x7fffdefca 700 (lwp 7423 ) ]
[ new thread0x7fffdc7c 9700 (lwp 7424 ) ]
[ new thread0x7fffd9fc 8700 (lwp 7425 ) ]
[ new thread0x7fffd 77 c 7700 (lwp 7426 ) ]
[ new thread0x7fffd4fc 6700 (lwp 7427 ) ]
[ new thread0x7fffd 27 c 5700 (lwp 7428 ) ]
[ new thread0x7fff cffc 4700 (lwp 7429 ) ]
[ new thread0x7fffcd7c 3700 (lwp 7430 ) ]
[ new thread0x7fffc AFC 2700 (lwp 7431 ) ]
[ new thread0x7fffc 87 c 1700 (lwp 7432 ) ]
[ new thread0x7fff C5 fc 0700 ] (lwp 7433 ) ]
[ new thread0x7fffc 37 BF 700 (lwp 7434 ) ]
[ new thread0x7fffc 0f be 700 (lwp 7435 ) ]
[ new thread0x7fff be7BD 700 (lwp 7436 ) ]
[ new thread0x7fffbbfbc 700 (lwp 7437 ) ]
[ new thread0x7fffb 97 bb 700 (lwp 7438 ) ]
[ new thread0x7fffb6FBA 700 (lwp 7439 ) ]
[ new thread0x7fffb 47 b 9700 (lwp 7440 ) ]
[ new thread0x7fff B1 FB 8700 (lwp 7441 ) ]
[ new thread0x7fffaf 7b 7700 (lwp 7442 ) ]
[ new thread0x7fffa CFB 6700 (lwp 7443 ) ]
[ new thread0x7fffaa 7b 5700 (lwp 7444 ) ]
[ new thread0x7fffa7FB 4700 (lwp 7445 ) ]
[ new thread0x7fff a57 b 3700 (lwp 7446 ) ]
[ new thread0x7fff a2 FB 2700 (lwp 7447 ) ]
[ new thread0x7fffa 07 b 1700 (lwp 7448 ) ]
[ new thread0x7ff F9 DFB 0700 (lwp 7449 ) ]
[ new thread0x7ff F9 b7af 700 (lwp 7450 ) ]
[ new thread0x7fff 98 FAE 700 (lwp 7451 ) ]
[ new thread0x7fff 967 ad 700 (lwp 7452 ) ]
[ new thread0x7fff 93 fac 700 (lwp 7453 ) ]
2 )现报告:
error(theAno.GPUArray ) : Could notinitialize pygpu,支持禁用。
File'pygpu/gpuarray.pyx ',line 658,inpygpu.gpuarray.init
File'pygpu/gpuarray.pyx ',line 587,inpygpu.gpuarray.pygpu_init
guarrayexception : cudeviceget : cuda _ error _ invalid _ device 3360 invaliddeviceordinal
请先不解决这个,先测试一下:
import keras也发现报告了与上述相同的错误!
康达安装mkl
康达安装mkl-service #使用这两个
句均显示:#All requested packages already installed.conda install blas
依旧不可以导入keras包。
3)将原有的conda环境删除,又新创建了环境,用conda安装了mkl之后,尝试import keras之后,仍然报错:
Using Theano backend.~/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano.
If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <=v7.
warnings.warn("Your cuDNN version is more recent than"ERROR (theano.gpuarray): Couldnotinitialize pygpu, support disabled
Traceback (most recent call last):
File"~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in use(config.device)
File"~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, inuse
init_dev(device, preallocate=preallocate)
File"~/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, ininit_dev**args)
File"pygpu/gpuarray.pyx", line 658, inpygpu.gpuarray.init
File"pygpu/gpuarray.pyx", line 587, inpygpu.gpuarray.pygpu_init
GpuArrayException: cuDeviceGet: CUDA_ERROR_INVALID_DEVICE: invalid device ordinal
在我的.theanorc配置文件中,是这么写的:
[global]
floatX=float32
device=cuda1
尝试去掉cuda编号?居然成功了!
Using Theano backend.~/.conda/envs/xhs/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano.
If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <=v7.
warnings.warn("Your cuDNN version is more recent than"Using cuDNN version7201on context None
Mapped name None to device cuda: GeForce GTX1080 Ti (0000:03:00.0)
接下来尝试解决 上述的用户警告。
由于theano已经是1.0.4最新版本,无法再进行更新,只能尝试将cuDNN版本降级。
但是使用conda list查看所有安装的包:
cudnn 6.0.21 cuda8.0_0 https://mirrors.tuna.tsinghua.edu.cn/a
#尝试此命令查看pygpu是否可用
DEVICE="cuda" python -c "import pygpu; pygpu.test()"
此帮助里说,如果不是使用多个GPU可以忽略test_collectives error。
#尝试以下,
python test_gpu.py~/.conda/envs/xhs/lib/python2.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <=v7.
warnings.warn("Your cuDNN version is more recent than"Using cuDNN version7201on context None
Mapped name None to device cuda: GeForce GTX1080 Ti (0000:03:00.0)
[GpuElemwise{exp,no_inplace}((float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping1000 times took 0.192847seconds
Resultis [1.2317803 1.6187935 1.5227807 ... 2.2077181 2.2996776 1.623233]
Used the gpu
发现其使用的cudnn版本是7.2,明明是6.0但是却调用了7.2?
查看cuda的版本信息发现:
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c)2005-2017NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release9.0, V9.0.176
//发现安装cuda简直十分麻烦,所以下尝试一下运行程序。
Starting epoch 0...
cjdjd (核心已转储)
#查看分配占空间的大小
ulimit -a#显示
stack size (kbytes, -s) 8192
也就仅仅8M大小,实在是太小了。
改为ulimit -s 102400,仍旧cjdjd。
试图将其调整为更大或者unlimit时,报错:
-bash: ulimit: stack size: 无法修改 limit 值: 不允许的操作
#使用sudo提示如下:
sudo: ulimit:找不到命令
在limit.conf下加了
#* soft stack unlimited
再使用ulimit -s unlimited就可以用了,但是运行程序发现仍是cjdjd,继续修改
#max locked memory (kbytes, -l) 64#尝试修改maxloc但是同样的方法不起作用
——————
终于解决了,在github上keras项目下发布的issue中找到了:
由于本机上的CUDA版本为9,所以又根据教程安装了CUDA8版本,以及cuDNN6.0版本,之后就可以了!!!
就是由于CUDA9不适合theano1.0!!!所以必须将版本,降版本之后就没有上述的warning了,就可以成功跑theano后端的keras代码了。