There are two possible causes for TensorFlow to drop when it is turned on the GPU.
2 was discussed in the recent TensorFlow community 2 </ sup>. In many cases, there was a problem on the driver side, but it seems that it may drop due to insufficient output of the power supply unit.
Driver updates are often a hotbed of trouble. The nvidia side is also not very good at dealing with problems caused by driver updates. (Especially if the gamer suffers a disadvantage, you can't expect the driver to be fixed immediately)
Therefore, it may be better to suspect that 2 is the cause before 1.
Depending on the version of the nvidia driver, the system may crash.
Keep your nvidia driver up to date.
Due to insufficient power output, the GPU may not be supplied with sufficient power and the entire system may crash.
Let's replace the PSU with a higher output one.
Use the nvidia-smi
command to set the upper limit of power consumption on the GPU side.
For example, in TITAN X, which is originally 250W, if you want to limit the power upper limit to 150W, execute the following command.
$ sudo nvidia-smi --power-limit=150
However, where to set the upper limit to operate normally depends on the model.
In addition, setting a power limit means that the GPU functions are limited and the original performance cannot be achieved.
Basically, it is recommended to replace the PSU with a higher output one.
1 http://suprsonicjetboy.hatenablog.com/entry/2017/04/23/194959 2 https://github.com/tensorflow/tensorflow/issues/8858
Recommended Posts