[PYTHON] I examined Yuki Kashiwagi's facial features to understand TensorFlow [Part 2]

I examined Yuki Kashiwagi's facial features to understand TensorFlow and CNN (Convolutional Neural Network). This is a sequel to the article "Yuki Kashiwagi's facial features were examined to understand TensorFlow [Part 1]". This time, I will focus on the learning part of TensorFlow in the whole process. I'm sorry that there are many parts that I do not understand while saying commentary: bow_tone1: Continue to Part 2 that focuses on the judgment part. 60.Summary01_flow.JPG

Learning process overview

As I explained in the first part, I am learning using TensorFlow. It is almost the same model as the tutorial for TensorFlow experts Deep MNIST for Experts. For the explanation, see the article ["[Explanation for beginners] TensorFlow Tutorial Deep MNIST"] Please refer to (http://qiita.com/FukuharaYohei/items/0aef74a0411273705512). 10.Summary02_model.JPG

Image processing flow

1. Change image size

Although it is not shown in the "Learning process overview" figure, the image size is changed to a unified size using OpenCV before processing with TensorFlow. Like "[Explanation for beginners] TensorFlow tutorial Deep MNIST", 27 pixels square is too small, so this time it is 81 pixels square. ↓ is a reference image, but for some reason the color is strange when you look at the image after resizing with TensorBoard (it is normal when imshowing ...). 70.process01_resize.JPG

In Python it says (put the whole code behind):

img = cv2.imread(file_name[0])
img = cv2.resize(img, (FLAGS.image_size, FLAGS.image_size))

2. First layer convolution process

In the 1st layer convolution process, the convolution process is performed using 32 types of filters with 5 pixels square. This is the same as "[Explanation for beginners] TensorFlow Tutorial Deep MNIST". The activation function remains ReLU as well. For an explanation of convolution processing, please refer to the article "[Explanation for beginners] Introduction to convolution processing (explained in TensorFlow)". Only a part of the image is introduced in ↓. It is characterized by the convolution process. In the image on the far right, you can see that the nose and mouth have disappeared, showing the characteristics of the eyes. The filter doesn't really make sense. 70.process02_conv01.JPG

3. First layer pooling process

In the 1st layer pooling process, the image of the 1st layer convolution result is max pooled to 1/3. Since the initial image size was as large as 81, I changed it from 1/2 to 1/3 compared to Deep MNIST. For more information on pooling, please refer to the article "[Explanation for beginners] Introduction to pooling (explained in TensorFlow)". If you look at the image, you can see that it is roughened by max pooling. 70.process03_pool01.JPG

4. Second layer convolution process

In the second layer convolution process, the convolution process is performed using 4 types of filters with 8 pixels square. I chose 8 pixels because I thought it was about the size of my nose. Also, I tried 4 types to narrow down the features (though the result didn't make sense). If you look at the image, you can see that it has failed at this point. Far from the features, most of them have disappeared ... Is the image you picked up bad? 70.process04_conv02.JPG

5. Second layer pooling process

In the 2nd layer pooling process, the image of the 2nd layer convolution result is max pooled to 1/3. If you look at the image, you can see that we are looking for features at the edges. Are you looking at your hairstyle? 70.process05_pool02.JPG

6. Tightly bonded layer

The rest is a tightly coupled layer. This is unchanged from Deep MNIST, including Drop Apt.

Learning execution

Ready to run

The folder structure looks like this, and tab-delimited text files called test data test.txt and training data train.txt are placed in the "inputs" folder. This folder / file structure can be changed with run-time parameters.

inputs 
│  test.txt 
│  train.txt 

The contents of the file are like the image, the first column is the file, the second column is 0 or 1 (0 is Yuki Kashiwagi). The character code is S-JIS and the line feed code is CR-LF. 80.text_data.JPG

Learning execution

python fully_connected_feed.py

When executed with the above command, the progress and result will be output as shown in the figure below. 81.Learning.JPG

The following are provided as run-time parameters. Implemented using the library argparse, which is explained a little in the article "Specify parameters for face detection in openCV to quickly improve detection accuracy" doing.

Parameters Contents initial value Remarks
learning_rate Learning rate 1e-4 AdamOptimizerのLearning rate
batch_size Batch size 20 We will learn training data for each of these numbers. Although there is little training data, the initial value is small due to being dragged
input_train_data Training data ./inputs/train.txt この値を変えればTraining dataのフォルダ・ファイルを指定可能
input_test_data test data ./inputs/test.txt この値を変えればtest dataのフォルダ・ファイルを指定可能
log_dir Log storage directory /tmp/tensorflow/kashiwagi/logs Directory for storing learned parameters and TensorBoard logs
image_size Image size 81 resizeするときの初期Image size
pool_size Pooling size 3 マックスPooling size

Processing flow

The entire process called Computational Graph in TensorBorad (["[Introduction] TensorFlow Basic Syntax and Concept"](http://qiita. com / FukuharaYohei / items / 0825c3518d8596c09396 # Computational-graph)) is output as shown in the figure below. This process flow is made by referring to the official tutorial TensorFlow Mechanics 101. 10.Summary03_CGraph_Overview.JPG The figure below shows the expansion of the main inference part. 10.Summary04_CGraph_Inference.JPG

Python program

Model part (model_deep.py)

Mnist.py of TensorFlow Mechanics 101 This is the learning model part created by referring to (/tutorials/mnist/mnist.py).

import tensorflow as tf

#Image tag to be output to TensorBoard
IMAGE_SOURCE = 'source'
IMAGE_FILTER = 'filter'
IMAGE_CONV   = 'conv'
IMAGE_POOL   = 'pool'

#Number of identification labels(This time Yuki Kashiwagi:0,Others: 1)
NUM_CLASSES       = 2
NUM_OUTPUT_IMAGES = 64
NUM_FILTER1       = 32
NUM_FILTER2       = 4
SIZE_FILTER1      = 5
SIZE_FILTER2      = 8
NUM_FC            = 1024

def inference(images, keep_prob, image_size, pool_size):
    
    with tf.name_scope('inference'):
        #Weight with standard deviation 0.Defined by a normal distribution random number of 1
        def weight_variable(shape):
            return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

        #Initial value of bias 0.Defined by 1 constant
        def bias_variable(shape):
            return tf.Variable(tf.constant(0.1, shape=shape))

        #Convolution layer definition
        def conv2d(x, W):
            return tf.nn.conv2d(x, W, [1, 1, 1, 1], 'SAME')

        #Pooling layer definition
        def max_pool(x):
            return tf.nn.max_pool(x, ksize=[1, pool_size, pool_size, 1], strides=[1, pool_size, pool_size, 1], padding='SAME')

        #Input information
        with tf.name_scope('input'):
            tf.summary.image(IMAGE_SOURCE, images, NUM_OUTPUT_IMAGES, family=IMAGE_SOURCE)

        #1st layer
        with tf.name_scope('1st_layer'):
            #1st convolution layer
            with tf.name_scope('conv1_layer') as scope:
                W_conv1 = weight_variable([SIZE_FILTER1, SIZE_FILTER1, 3, NUM_FILTER1])
                b_conv1 = bias_variable([NUM_FILTER1])
                h_conv1 = tf.nn.relu(conv2d(images, W_conv1) + b_conv1)
                
                #Tensor[Vertical,side,3,Number of filters]From[Number of filters,Vertical,side,3]Permutation conversion and image output
                tf.summary.image(IMAGE_FILTER, tf.transpose(W_conv1, perm=[3,0,1,2]), 4, family=IMAGE_FILTER)
                
                #Tensor[-1,Vertical,side,Number of filters]From[-1,Number of filters,Vertical,side]And permutation conversion, merge the head 2D and output image
                tf.summary.image(IMAGE_CONV, tf.reshape(tf.transpose(h_conv1, perm=[0,3,1,2]), [-1,image_size,image_size,1]), 4 , family=IMAGE_CONV)

            #1st pooling layer
            with tf.name_scope('pool1_layer') as scope:
                
                #Image size calculation after pooling process
                image_size1 = int(image_size / pool_size)
                
                h_pool1 = max_pool(h_conv1)
               
                #Tensor[-1,Vertical,side,Number of filters]From[-1,Number of filters,Vertical,side]And permutation conversion, merge the head 2D and output image
                tf.summary.image(IMAGE_POOL, tf.reshape(tf.transpose(h_pool1,perm=[0,3,1,2]),[-1, image_size1, image_size1, 1]),
                                 NUM_OUTPUT_IMAGES, family=IMAGE_POOL)
                
        #2nd layer
        with tf.name_scope('2nd_layer'):
            #2nd convolution layer
            with tf.name_scope('conv2_layer') as scope:
                W_conv2 = weight_variable([SIZE_FILTER2, SIZE_FILTER2, NUM_FILTER1, NUM_FILTER2])
                b_conv2 = bias_variable([NUM_FILTER2])
                h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
                
                #Tensor[Vertical,side,Number of filters 1,Number of filters 2]From[Number of filters 1*Number of filters 2,Vertical,side,1]And permutation and output iimage to TensorBoard
                tf.summary.image(IMAGE_FILTER, tf.reshape(tf.transpose(W_conv2,perm=[2,3,0,1]),[-1,SIZE_FILTER2,SIZE_FILTER2,1]), 4, family=IMAGE_FILTER)
                #Tensor[-1,Vertical,side,64]From[-1,64,Vertical,side]And permutation,[-1]When[64]Merge and output iimage to TensorBoard
                tf.summary.image(IMAGE_CONV, tf.reshape(tf.transpose(h_conv2,perm=[0,3,1,2]),[-1,image_size1,image_size1,1]), 4, family=IMAGE_CONV)

            #2nd pooling layer
            with tf.name_scope('pool2_layer') as scope:

                #Image size calculation after pooling process
                image_size2 = int(image_size1 / pool_size)

                h_pool2 = max_pool(h_conv2)
                #Tensor[-1,Vertical,side,Number of filters 2]From[-1,Number of filters 2,Vertical,side]And permutation conversion, merge the head 2D and output the image to TensorBoard
                tf.summary.image(IMAGE_POOL, tf.reshape(tf.transpose(h_pool2,perm=[0,3,1,2]),[-1,image_size2,image_size2,1]), 
                                 NUM_OUTPUT_IMAGES, family=IMAGE_POOL)

        #Creation of fully connected layer 1
        with tf.name_scope('fc1_layer') as scope:
            W_fc1 = weight_variable([image_size2 ** 2 * NUM_FILTER2, NUM_FC])
            b_fc1 = bias_variable([NUM_FC])
            h_pool2_flat = tf.reshape(h_pool2, [-1, image_size2 ** 2 * NUM_FILTER2])
            
            h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
            h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

        #Creation of fully connected layer 2(Read layer)
        with tf.name_scope('fc2_layer') as scope:
            W_fc2 = weight_variable([NUM_FC, NUM_CLASSES])
            b_fc2 = bias_variable([NUM_CLASSES])

            logits = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

        return logits
    
def loss(logits, labels):

    with tf.name_scope('loss'):
        #Calculation of cross entropy
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')

        #Error rate value(cross_entropy)return it
        return tf.reduce_mean(cross_entropy)

#error(loss)Train a learning model designed using error backpropagation based on
def training(loss, learning_rate):
    with tf.name_scope('training'):
        #Scalar output of error to TensorBoard
        tf.summary.scalar('loss', loss)
        
        #Optimized with Adam
        train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
        return train_op

#Calculate the number of correct answers for the prediction results produced by the learning model
def evaluation(logits, labels, batch_size):

    with tf.name_scope('evaluation'):
        
        #Calculation of the number of correct answers
        correct = tf.reduce_sum(tf.cast(tf.nn.in_top_k(logits, labels, 1), tf.int32))
        
        #Calculation of correct answer rate and Scalar output to TensorBoard
        accuracy = correct / batch_size
        tf.summary.scalar('accuracy', accuracy)
        return correct, accuracy

Training-time model call program (fully_connected_feed.py)

Fully_connected_feed.py in TensorFlow Mechanics 101 It is a model call program at the time of learning made by referring to /examples/tutorials/mnist/fully_connected_feed.py). Since the training data is created for a small number, please change the code appropriately if there are many.

import argparse
import cv2
import os
import time
import numpy as np
import random
import sys
import tensorflow as tf
import model_deep

#Basic model parameters
FLAGS = None

#Answer all data
def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            images_data, labels_data,
            keep_prob):
    
    true_count = 0  #Number of correct answers

    #Calculate the total number
    steps_per_epoch = len(images_data) // FLAGS.batch_size  #Truncate division
    num_examples = steps_per_epoch * FLAGS.batch_size       # num_examples eventually subtract only the truncation

    #Evaluation of all cases
    for step in range(steps_per_epoch):

        #Receive the number of correct answers and add
        true_count += sess.run(eval_correct, 
                               feed_dict={images_placeholder: images_data[step * FLAGS.batch_size: step * FLAGS.batch_size + FLAGS.batch_size],
                                          labels_placeholder: labels_data[step * FLAGS.batch_size: step * FLAGS.batch_size + FLAGS.batch_size],
                                          keep_prob: 1.0
                                         })

    #Correct answer rate calculation and display
    print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' % (num_examples, true_count, (float(true_count) / num_examples)))
    
def run_training():
    #Specify the scope to be output to the graph of TensorBoard
    with tf.Graph().as_default():
        
        #Placeholder definition
        images_placeholder = tf.placeholder(tf.float32, name='images', shape=(FLAGS.batch_size, FLAGS.image_size, FLAGS.image_size, 3))
        labels_placeholder = tf.placeholder(tf.int32,   name='labels', shape=(FLAGS.batch_size) )    
        keep_prob = tf.placeholder(tf.float32, name='keep_probability' )

        # inference()To create a model
        logits = model_deep.inference(images_placeholder, keep_prob, FLAGS.image_size, FLAGS.pool_size)
        
        # loss()To calculate the loss
        loss_value = model_deep.loss(logits, labels_placeholder)
        
        # training()To train and adjust the parameters of the learning model
        train_op = model_deep.training(loss_value, FLAGS.learning_rate)
        
        #Accuracy calculation
        eval_correct, accuracy = model_deep.evaluation(logits, labels_placeholder, FLAGS.batch_size)
        
        #Output the contents so far to TensorBoard
        summary = tf.summary.merge_all()

        #Ready to save
        saver = tf.train.Saver()
                
        #Creating a Session
        with tf.Session() as sess:

            #Preparing to write to TensorBoard
            summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
            
            #Variable initialization
            sess.run(tf.global_variables_initializer())

            #Image data loop
            for step in range(len(FLAGS.train_image) // FLAGS.batch_size):
                
                #Start time evacuation
                start_time = time.time()
                
                # batch_Training for size images
                train_batch = FLAGS.batch_size * step
                
                #Training execution
                feed_dict = {
                    images_placeholder: FLAGS.train_image[train_batch:train_batch + FLAGS.batch_size],
                    labels_placeholder: FLAGS.train_label[train_batch:train_batch+FLAGS.batch_size],
                    keep_prob: 0.5
                    }

                # train_I'm throwing away op, but I'm not smart unless I specify it
                _, loss_val, accuracy_val = sess.run([train_op, loss_value, accuracy] , feed_dict=feed_dict)
                
                # (Per epoch)Processing time calculation
                duration = time.time() - start_time

                #Summary every 5 times(Instance of tensor board)Get and add to writer
                if step % 5 == 0:
                    #Result output every 10 times
                    print('Step %d: loss = %.2f, accuracy = %.3f (%.4f sec)' % (step, loss_val, accuracy_val, duration))

                    #Execute session and get summary of TensorBoard
                    summary_str = sess.run(summary, feed_dict=feed_dict)
                    
                    #Added summary to TensorBoard
                    summary_writer.add_summary(summary_str, step)
                    summary_writer.flush()

                #Evaluated during the final loop
                if (step + 1) == len(FLAGS.train_image)//FLAGS.batch_size:
                    saver.save(sess, os.path.join(FLAGS.log_dir, 'model.ckpt'), global_step=step)

                    print('Training Data Eval:')                
                    #Training data evaluation
                    do_eval(sess, eval_correct, images_placeholder, labels_placeholder, FLAGS.train_image, FLAGS.train_label, keep_prob)

                    #Test data evaluation
                    print('Test Data Eval:')
                    do_eval(sess, eval_correct, images_placeholder, labels_placeholder, FLAGS.test_image, FLAGS.test_label, keep_prob)

            #TensorBoard write close
            summary_writer.close()

#Read image list files and convert individual image files and labels to TensorFlow format
def read_images(file_image_list):
    
    #Array to put data
    image_list = []
    label_list = []

    #Open file in read mode
    with open(file_image_list) as file:
        file_data = file.readlines()
    
    #Random shuffle order
    random.shuffle(file_data)
    
    for line in file_data:
        #Separated with spaces except line breaks
        line      = line.rstrip()     #Remove trailing spaces
        file_name = line.split('\t')  #Delimiter string(tab)Separate as
        #Read image data and FLAGS.image_size reduced to all sides
        img = cv2.imread(file_name[0])
        img = cv2.resize(img, (FLAGS.image_size, FLAGS.image_size))

        # 0-Convert to a float value of 1
        image_list.append(img.astype(np.float32)/255.0)
        
        #Add to the end of the label array
        label_list.append(int(file_name[1]))

    #Convert to numpy format and return
    return np.asarray(image_list), np.asarray(label_list)

#Main processing
def main(_):

    #If the TensorBoard save directory exists, delete it and recreate it.
    if tf.gfile.Exists(FLAGS.log_dir):
        tf.gfile.DeleteRecursively(FLAGS.log_dir)
    tf.gfile.MakeDirs(FLAGS.log_dir)

    #Training and test data Image file reading
    print('Start reading images')
    FLAGS.train_image, FLAGS.train_label = read_images(FLAGS.input_train_data)
    FLAGS.test_image,  FLAGS.test_label  = read_images(FLAGS.input_test_data)

    #Start training
    print('Start training')
    run_training()

#Make it usable even when imported
parser = argparse.ArgumentParser()

#Input parameter definition
parser.add_argument(
    '--learning_rate',
    type=float,
    default=1e-4,
    help='Initial learning rate.'
)
parser.add_argument(
    '--batch_size',
    type=int,
    default=20,
    help='Batch size.  Must divide evenly into the dataset sizes.'
)
parser.add_argument(
    '--input_train_data',
    type=str,
    default='./inputs/train.txt',
    help='File list data to put the input train data.'
)
parser.add_argument(
    '--input_test_data',
    type=str,
    default='./inputs/test.txt',
    help='File list data to put the input test data.'
)
parser.add_argument(
    '--log_dir',
    type=str,
    default='/tmp/tensorflow/kashiwagi/logs',
    help='Directory to put the log data.'
)
parser.add_argument(
    '--image_size',
    type=int,
    default=81,
    help='Input image size'
)
parser.add_argument(
    '--pool_size',
    type=int,
    default=3,
    help='MAX pooling size'
)

FLAGS, unparsed = parser.parse_known_args()

if __name__ == '__main__':
    #main function start
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Recommended Posts

I examined Yuki Kashiwagi's facial features to understand TensorFlow [Part 2]
I examined Yuki Kashiwagi's facial features to understand TensorFlow [Part 2]
I examined Yuki Kashiwagi's facial features to understand TensorFlow [Part 1]
I want to understand systemd roughly
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I want to handle the rhyme part1
I want to handle the rhyme part3
I tried to classify text using TensorFlow
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
Implemented DQN in TensorFlow (I wanted to ...)
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.