introduction

Ceci est mon premier message posté. Je cherchais depuis longtemps quelque chose comme un blog comme mémorandum de développement, mais quand j'ai rencontré Qiita, je pensais le publier un jour. Donc, cette fois, j'ai heurté un mur que je ne pouvais pas résoudre et j'ai décidé de l'afficher.

Problème (non résolu en date du 12/3) (résolu? En date du 12/7)

Cela fait presque un mois que Google a publié TensorFlow. J'ai récemment commencé à étudier le Deeplearning, alors j'ai sauté à TensorFlow.

TensorFlow a déjà été résumé à divers endroits, et le didacticiel s'est bien déroulé. (Je pense le mettre ensemble à une date ultérieure)

Ensuite, j'ai écrit et exécuté un programme pour reconnaître les données d'image. Ensuite, l'erreur suivante se produit

$ python test.py 
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 12
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:05:00.0
Total memory: 11.99GiB
Free memory: 11.47GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 11701021287
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 12
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 256 (256B) Pool: chunks: 64 free: 24 cumulative malloc: 134728 cumulative freed: 134688
Number of chunks: 64, in_use chunks: 40
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 4096 (4.0KiB) Pool: chunks: 8 free: 2 cumulative malloc: 2812 cumulative freed: 2806
Number of chunks: 8, in_use chunks: 6
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 8192 (8.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 2814 cumulative freed: 2809
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 16384 (16.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 11233 cumulative freed: 11228
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 65536 (64.0KiB) Pool: chunks: 16 free: 16 cumulative malloc: 44896 cumulative freed: 44896
Number of chunks: 16, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 98304 (96.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 11224 cumulative freed: 11224
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 131072 (128.0KiB) Pool: chunks: 4 free: 4 cumulative malloc: 14030 cumulative freed: 14030
Number of chunks: 4, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 212992 (208.0KiB) Pool: chunks: 8 free: 3 cumulative malloc: 11232 cumulative freed: 11227
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 229376 (224.0KiB) Pool: chunks: 2 free: 1 cumulative malloc: 2 cumulative freed: 1
Number of chunks: 2, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 262144 (256.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 16836 cumulative freed: 16836
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 425984 (416.0KiB) Pool: chunks: 1 free: 1 cumulative malloc: 2806 cumulative freed: 2806
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 524288 (512.0KiB) Pool: chunks: 8 free: 8 cumulative malloc: 25254 cumulative freed: 25254
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1048576 (1.00MiB) Pool: chunks: 8 free: 8 cumulative malloc: 25254 cumulative freed: 25254
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 13631488 (13.00MiB) Pool: chunks: 8 free: 3 cumulative malloc: 2814 cumulative freed: 2809
Number of chunks: 8, in_use chunks: 5
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 268435456 (256.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 369098752 (352.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 738197504 (704.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1476395008 (1.38GiB) Pool: chunks: 0 free: 0 cumulative malloc: 0 cumulative freed: 0
Number of chunks: 0, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2952790016 (2.75GiB) Pool: chunks: 3 free: 3 cumulative malloc: 3 cumulative freed: 3
Number of chunks: 3, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:345] Aggregate Region Memory: 11701021287 (10.90GiB)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:347] Aggregate Chunk Memory: 10363027456 (9.65GiB)
W tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:89] Out of GPU memory, see memory state dump above
W tensorflow/core/kernels/conv_ops.cc:162] Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x10426540 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x127a7090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: range_1/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_394_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x127a7090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: Cast/_13 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_393_Cast", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "img_ditect_train.py", line 229, in <module>
    keep_prob: 1.0})
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 345, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 419, in _do_run
    e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shapedim { size: 28060 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
	 [[Node: conv2/Conv2D = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1/MaxPool,conv2/Variable)]]
	 [[Node: range_1/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_394_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'conv2/Conv2D', defined at:
  File "test.py", line 196, in <module>
    logits = inference(images_placeholder, keep_prob)
  File "test.py", line 70, in inference
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
  File "test.py", line 46, in conv2d
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 207, in conv2d
    use_cudnn_on_gpu=use_cudnn_on_gpu, name=name)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
    op_def=op_def)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/tensorflow-GPU/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 988, in __init__
    self._traceback = _extract_stack()

J'enquête actuellement sur la cause, mais le GPU manque-t-il de mémoire? Je ne sais pas ... Je voudrais demander à quelqu'un d'enseigner ...

Post-scriptum (12/7)

Vous pouvez voir les choses calmement lorsque vous définissez la date. En ce qui concerne l'erreur ci-dessus, pour le moment, le programme a réussi sans générer d'erreur, je vais donc l'ajouter. Pour le moment.

From MATS ** "shapeim {size: 28060} dim {size: 14} dim {size: 14} dim {size: 64} Pourquoi n'essayez-vous pas de réduire la taille?" ** Comme vous l'avez souligné, je me suis concentré sur la pièce {size: 28060}.

Au stade de la semaine dernière, c'était 28060 ... et j'étais dans un cercle vicieux, mais je pense que c'est le nombre d'images que je donne calmement. J'ai remarqué. (Demi-nombre) Eh bien, pourquoi ne l'avez-vous pas remarqué? Au fait, le programme est un programme de reconnaissance d'image. Je le posterai quand il sera terminé.

Ainsi, lorsque j'ai réduit le nombre d'images à apprendre à environ 1000, le programme a réussi. La précision de la reconnaissance n'est pas mauvaise non plus.

Cependant, ma perception est que s'il n'y a pas beaucoup d'images à donner, une grande précision ne sera pas obtenue, donc j'aimerais le résoudre par programme.

Ensuite, je l'ajouterai dès qu'il y aura une mise à jour.

[PYTHON] Je suis tombé sur TensorFlow (Quelle est la mémoire du GPU)

introduction

Problème (non résolu en date du 12/3) (résolu? En date du 12/7)

Post-scriptum (12/7)