Introduction

I looked at my posts up to the last time, but it's very hard to see ... I have to devise a way of writing little by little. Hi, this is ikki. Preferred Networks, which I saw at the Robot Exhibition the other day, developed it in collaboration with FANUC. " The blog about "Learning from 0 of robots piled up in pieces by deep learning" (https://research.preferred.jp/2015/12/robot_binpick_deep_learning/) "has been updated. Or rather, it was already posted when I posted it last time. .. .. I was listening to the story at the workshop, and I was surprised to hear that this robot was made in 3 months. President Nishikawa said, "It would have taken three years for an ordinary person (laughs)", but I don't have the ability to do it even if it takes three years.

So, the introduction has become long, but this time, following the previous article, it is about the problem of "Out of GPU Memory". ..

problem

From the advice of ReoAoki and MATS to the previous article, I was able to create a program that learns images and recognizes them without throwing an error, but the number of images that can be fed is small and it is not good enough. The recognition accuracy is not good. Should I devise an image (data set) to be trained? I also think that I would like to solve it in the program if possible. In the future, I want to create learning data without throwing errors no matter how many images are eaten! I think. (Although there is a limit to "how much")

program

The current program is as follows. This time as well, I will refer to Identifying the production company of anime Yuruyuri with the kivantium activity diary TensorFlow. Thank you very much.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import cv2
import numpy as np
import tensorflow as tf
import tensorflow.python.platform

NUM_CLASSES = 2
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE*IMAGE_SIZE*3

flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('save_model', 'models/model.ckpt', 'File name of model data')
flags.DEFINE_string('train', 'training/train_.txt', 'File name of train data')
flags.DEFINE_string('test', 'training/test_.txt', 'File name of test data')
flags.DEFINE_string('train_dir', '/tmp/pict_data', 'Directory to put the training data.')
flags.DEFINE_integer('max_steps', 200, 'Number of steps to run trainer.')
flags.DEFINE_integer('batch_size', 128, 'Batch size'
                     'Must divide evenly into the dataset sizes.')
flags.DEFINE_float('learning_rate', 1e-4, 'Initial learning rate.')

def inference(images_placeholder, keep_prob):
    ####################################################################
    #Functions that create predictive models
    #argument: 
    #  images_placeholder:Image placeholder
    #  keep_prob:dropout rate placeholder
    #Return value:
    #  y_conv:Probability of each class(Something like)
    ####################################################################

    #Weight with standard deviation 0.Initialized with a normal distribution of 1
    def weight_variable(shape):
      initial = tf.truncated_normal(shape, stddev=0.1)
      return tf.Variable(initial)

    #Bias standard deviation 0.Initialized with a normal distribution of 1
    def bias_variable(shape):
      initial = tf.constant(0.1, shape=shape)
      return tf.Variable(initial)

    #Creating a convolution layer
    def conv2d(x, W):
      return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

    #Creating a pooling layer
    def max_pool_2x2(x):
      return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                            strides=[1, 2, 2, 1], padding='SAME')
    
    #Transform input to 28x28x3
    x_images = tf.reshape(images_placeholder, [-1, IMAGE_SIZE, IMAGE_SIZE, 3])

    #Creation of convolution layer 1
    with tf.name_scope('conv1') as scope:
        W_conv1 = weight_variable([5, 5, 3, 32])
        b_conv1 = bias_variable([32])
        h_conv1 = tf.nn.relu(conv2d(x_images, W_conv1) + b_conv1)

    #Creation of pooling layer 1
    with tf.name_scope('pool1') as scope:
        h_pool1 = max_pool_2x2(h_conv1)
    
    #Creation of convolution layer 2
    with tf.name_scope('conv2') as scope:
        W_conv2 = weight_variable([5, 5, 32, 64])
        b_conv2 = bias_variable([64])
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

    #Creation of pooling layer 2
    with tf.name_scope('pool2') as scope:
        h_pool2 = max_pool_2x2(h_conv2)

    #Creation of fully connected layer 1
    with tf.name_scope('fc1') as scope:
        W_fc1 = weight_variable([7*7*64, 1024])
        b_fc1 = bias_variable([1024])
        h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
        #dropout settings
        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    #Creation of fully connected layer 2
    with tf.name_scope('fc2') as scope:
        W_fc2 = weight_variable([1024, NUM_CLASSES])
        b_fc2 = bias_variable([NUM_CLASSES])

    #Normalization with softmax function
    with tf.name_scope('softmax') as scope:
        y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

    #Returns something like the probability of each label
    return y_conv

def loss(logits, labels):
    ####################################################################
    #Function to calculate loss
    #argument:
    #  logits:Logit tensor, float - [batch_size, NUM_CLASSES]
    #  labels:Label tensor, int32 - [batch_size, NUM_CLASSES]
    #Return value:
    #  cross_entropy:Cross entropy tensor, float
    ####################################################################

    #Calculation of cross entropy
    cross_entropy = -tf.reduce_sum(labels*tf.log(logits))
    #Specify to display in TensorBoard
    tf.scalar_summary("cross_entropy", cross_entropy)
    return cross_entropy

def training(loss, learning_rate):
    ####################################################################
    #Functions that define training ops
    #argument:
    #  loss:Loss tensor, loss()Result of
    #  learning_rate:Learning coefficient
    #Return value:
    #  train_step:Training op
    ####################################################################

    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    return train_step

def accuracy(logits, labels):
    ####################################################################
    #Correct answer rate(accuracy)Function to calculate
    #argument: 
    #  logits: inference()Result of
    #  labels:Label tensor, int32 - [batch_size, NUM_CLASSES]
    #Return value:
    #  accuracy:Correct answer rate(float)
    ####################################################################
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    tf.scalar_summary("accuracy", accuracy)
    return accuracy

if __name__ == '__main__':
    #Open file
    with open(FLAGS.train, 'r') as f: # train.txt
        train_image = []
        train_label = []
        for line in f:
            line = line.rstrip()
            l = line.split()
            img = cv2.imread(l[0])
            img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
            train_image.append(img.flatten().astype(np.float32)/255.0)
            tmp = np.zeros(NUM_CLASSES)
            tmp[int(l[1])] = 1
            train_label.append(tmp)
        train_image = np.asarray(train_image)
        train_label = np.asarray(train_label)
        print len(train_image)

    with open(FLAGS.test, 'r') as f: # test.txt
        test_image = []
        test_label = []
        for line in f:
            line = line.rstrip()
            l = line.split()
            img = cv2.imread(l[0])
            img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
            test_image.append(img.flatten().astype(np.float32)/255.0)
            tmp = np.zeros(NUM_CLASSES)
            tmp[int(l[1])] = 1
            test_label.append(tmp)
        test_image = np.asarray(test_image)
        test_label = np.asarray(test_label)
    
    with tf.Graph().as_default():
        images_placeholder = tf.placeholder("float", shape=(None, IMAGE_PIXELS))
        labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
        keep_prob = tf.placeholder("float")

        logits = inference(images_placeholder, keep_prob)
        loss_value = loss(logits, labels_placeholder)
        train_op = training(loss_value, FLAGS.learning_rate)
        acc = accuracy(logits, labels_placeholder)

        #Ready to save
        saver = tf.train.Saver()
        sess = tf.Session()
        sess.run(tf.initialize_all_variables())
        #Setting the value to be displayed on TensorBoard
        summary_op = tf.merge_all_summaries()
        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph_def)
        
        for step in range(FLAGS.max_steps):
            for i in range(len(train_image)/FLAGS.batch_size):
                batch = FLAGS.batch_size*i
                sess.run(train_op, feed_dict={
                  images_placeholder: train_image[batch:batch+FLAGS.batch_size],
                  labels_placeholder: train_label[batch:batch+FLAGS.batch_size],
                  keep_prob: 0.5})

            train_accuracy = sess.run(acc, feed_dict={
                images_placeholder: train_image,
                labels_placeholder: train_label,
                keep_prob: 1.0})
            print "step %d, training accuracy %g"%(step, train_accuracy)

            summary_str = sess.run(summary_op, feed_dict={
                images_placeholder: train_image,
                labels_placeholder: train_label,
                keep_prob: 1.0})
            summary_writer.add_summary(summary_str, step)

    print "test accuracy %g"%sess.run(acc, feed_dict={
        images_placeholder: test_image,
        labels_placeholder: test_label,
        keep_prob: 1.0})

    #Save the final model
    save_path = saver.save(sess, FLAGS.save_model)

As for the contents of the program ・ Learning by reading 2 classes of images from train.txt and test.txt ・ The size of the image varies, but it has been resized to 28 * 28 by resize. -Output the learned result to a file called model.ckpt "GeForce GTX TITAN X 12GB" is used for GPU.

I also saw a thread that this problem is a memory allocator problem and can be solved by updating TensorFlow to the latest version, but it does not solve it after all. ・・・

How can I feed 20,000 or 30,000 images? We will continue to investigate. .. ..

** Probably solved! !! ** → TensorFlow To learn from a large number of images ... ~ (almost) solution ~

[PYTHON] TensorFlow To learn from a large number of images ... (Unsolved problem) → 12/18 Solved

Introduction

problem

program