Categories
Misc

linux seems stucking on cudnn how to fix it?

hi

same repo to run the tensorflow gpu code.

on Win, it passed the stage like :

- weight name: discriminator/gan/conv6/bias:0, shape: [256], size: 256 [32;1mTrigger callback: [0mTotal counts of trainable weights: 33579064. Total size of trainable weights: 0G 32M 24K 56B (Assuming32-bit data type.) 2022-05-05 11:29:14.865680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2022-05-05 11:29:15.131382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2022-05-05 11:29:15.901900: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 

however the linux stays at the

Total size of trainable weights: 0G 32M 24K 56B (Assuming32-bit data type.) 

for ever,

assume this linux wasn’t able to open the cudnn???

cudnn is installed by running

$conda list cudatoolkit 10.0.130 hf841e97_10 conda-forge cudnn 7.6.5.32 ha8d7eb6_1 conda-forge 

the version seems fine with tensorflow-gpu 1.15 tensorflow-cuda

after installing the cuda with system package manager,

but there is another cudatoolkit 10 and the cudnn comes withthe tensorflow-gpu. did not restart the machine. does this matter?

2022-05-05 12:46:14.496348: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2022-05-05 12:46:14.520365: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2601325000 Hz 2022-05-05 12:46:14.521958: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5565bcb72f40 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-05-05 12:46:14.521982: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-05-05 12:46:14.523737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2022-05-05 12:46:14.526821: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination 2022-05-05 12:46:14.526889: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.60.2 2022-05-05 12:46:14.526904: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 495.44.0 2022-05-05 12:46:14.526916: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 495.44.0 does not match DSO version 510.60.2 -- cannot find working devices in this configuration 

is this because the cuda 11 installed by system not compatible with the cudatoolkit 10 by conda?

any idea how to fix this one?

thanks a lot

submitted by /u/boydbuilding
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *