[PYTHON] Dealing with CUDA error "At tempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error" in deep learning

Purpose

When trying to learn by deep learning using GPU, The following error may occur.

2019-11-18 04:16:42.405806: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

Even if you search the net for the cause of this error, There is not very good information. Maybe I just don't understand English or Chinese.

Show that you understand yourself.

Show the environment for reference

tensorflow           1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu       1.14.0

Error countermeasures

It may be a version of tensorflow, but As one case Just out of memory (in this case, CPU memory, not GPU memory) And this error I have confirmed that it will come out.

if, ** If you can reduce CPU memory usage Please give it a try. ** **

By the way, I have no idea what this error is. (Maybe it's not an understandable error.)

Another error (totally unresolved.)

The following error may occur. In the first place, I can't understand the meaning of the error message well. Even if I look it up online, there is no useful information.

Error excerpt

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.90 GiB already allocated; 30.80 MiB free; 9.54 MiB cached)

The entire

D:\_mish1\Mish-master\Mish-master\Examples and Benchmarks>python _res50_1.py
Files already downloaded and verified
Files already downloaded and verified
Traceback (most recent call last):
  File "_res50_1.py", line 329, in <module>
    logps = model.forward(inputs)
  File "_res50_1.py", line 242, in forward
    x = self.conv2(x)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "_res50_1.py", line 208, in forward
    return f_mish(self.split_transforms(x) + self.shortcut(x))
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "C:\Users\XYZZZ\AppData\Roaming\Python\Python37\site-packages\torch\nn\functional.py", line 1656, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.90 GiB already allocated; 30.80 MiB free; 9.54 MiB cached)

Summary

If anyone sees this and solves the problem, I would be grateful.

Related (person)

Use python without stress! (Become familiar with generator. It seems to be since1975.) Use python without stress! (In Python, everything is implemented as an object) Use python without stress! (Close to Pylint) Use python without stress! (Expression and Statement) Learn Python carefully using both English and Japanese.

from now on

If you have any comments, please let us know. : candy: Will study,

Recommended Posts

Dealing with CUDA error "At tempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error" in deep learning
Dealing with Python error "Attribute Error: module'scipy.misc' has no attribute'imresize'" in deep learning
Countermeasures for "Unable to get upper directory" error when using Deep Learning ② created from scratch with spyder of ANACONDA
Dealing with Tensorflow error "Import Error: DLL load failed: Specified module not found" in deep learning
What to do if you get angry with "Value Error: unknown local: UTF-8" in python manage.py syncdb