首页 > 编程知识 正文

no cuda capable device(curl command not found)

时间:2023-05-05 00:52:29 阅读:65067 作者:2996

问题是在培训Transformer的过程中出现的pytorhc问题:运行时错误: cuda运行时错误(59 ) :设备- sideasserttriggeredatc 3360/w/1/s py torch _ 156536019852/work/aten/src浅犀牛/THCReduceAll.cuh:327

具体报告jddej

c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/py torch _ 156536019852/work/aten/src/thc thc 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 65,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 66,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 67,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 68,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 69,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 70,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 71,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 72,0, 0 )辅助(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139/conda/conda-bld/thread : [ 73,0,0 ]断言(srcindexsrcselectdimsize ) failed.c :/w/1/s/tmp _ conda _ 3.6 _ 155139

ead: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.THCudaCheck FAIL file=C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src淡淡的犀牛/THCReduceAll.cuh line=327 error=59 : device-side assert triggeredTraceback (most recent call last): File "C:UsersAppDataLocalcondacondaenvsyuanbo_pytorchlibsite-packagestorchnnfunctional.py", line 3105, in multi_head_attention_forward qkv_same = torch.equal(query, key) and torch.equal(key, value)RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src淡淡的犀牛/THCReduceAll.cuh:327 解决方法

debug了很久也没有找到问题所在,后来发现 GPU 不能正确定位异常位置,device改用 CPU 后才发现真正的错误:RuntimeError: index out of range: Tried to access index 103 out of table with 99 rows. at C:w1stmp_conda_3.6_155139condaconda-bldpytorch_1565366019852workatensrcTH/generic/THTensorEvenMoreMath.cpp:237

原来是由于索引出错了,检查后发现,在 Transformer 的 decoder 做 position embedding 的时候,由于词表中的索引出错导致出现了 “RuntimeError: cuda runtime error (59) : device-side assert triggered”。重新制备词表即可。

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。