[PYTHON] The problem that the system restarts without permission while learning using GPU in TensorFlow

There are two possible causes for TensorFlow to drop when it is turned on the GPU.

  1. nvidia driver problem
  2. Insufficient output of PSU (power supply unit)

2 was discussed in the recent TensorFlow community 2 </ sup>. In many cases, there was a problem on the driver side, but it seems that it may drop due to insufficient output of the power supply unit.

Driver updates are often a hotbed of trouble. The nvidia side is also not very good at dealing with problems caused by driver updates. (Especially if the gamer suffers a disadvantage, you can't expect the driver to be fixed immediately)

Therefore, it may be better to suspect that 2 is the cause before 1.

1. nvidia driver problem

Depending on the version of the nvidia driver, the system may crash.

Keep your nvidia driver up to date.

2. Insufficient PSU output

Due to insufficient power output, the GPU may not be supplied with sufficient power and the entire system may crash.

Countermeasure 1

Let's replace the PSU with a higher output one.

Countermeasure 2

Use the nvidia-smi command to set the upper limit of power consumption on the GPU side.

For example, in TITAN X, which is originally 250W, if you want to limit the power upper limit to 150W, execute the following command.

$ sudo nvidia-smi --power-limit=150

However, where to set the upper limit to operate normally depends on the model.

In addition, setting a power limit means that the GPU functions are limited and the original performance cannot be achieved.

Basically, it is recommended to replace the PSU with a higher output one.

reference

1 http://suprsonicjetboy.hatenablog.com/entry/2017/04/23/194959 2 https://github.com/tensorflow/tensorflow/issues/8858

Recommended Posts

The problem that the system restarts without permission while learning using GPU in TensorFlow
Dealing with tensorflow suddenly stopped working using GPU in deep learning
Until the Deep Learning environment (TensorFlow) using GPU is prepared for Ubuntu 14.04
The story that `while queue` did not work in python
Solve the Japanese problem when using the CSV module in Python.
Solution to the problem that the display is corrupted when the .exe command is included in the while loop in wsl2