I examined Yuki Kashiwagi's facial features to understand TensorFlow and CNN (Convolutional Neural Network). This is a sequel to the article "Yuki Kashiwagi's facial features were examined to understand TensorFlow [Part 1]". This time, I will focus on the learning part of TensorFlow in the whole process. I'm sorry that there are many parts that I do not understand while saying commentary: bow_tone1: Continue to Part 2 that focuses on the judgment part.
As I explained in the first part, I am learning using TensorFlow. It is almost the same model as the tutorial for TensorFlow experts Deep MNIST for Experts. For the explanation, see the article ["[Explanation for beginners] TensorFlow Tutorial Deep MNIST"] Please refer to (http://qiita.com/FukuharaYohei/items/0aef74a0411273705512).
Although it is not shown in the "Learning process overview" figure, the image size is changed to a unified size using OpenCV before processing with TensorFlow. Like "[Explanation for beginners] TensorFlow tutorial Deep MNIST", 27 pixels square is too small, so this time it is 81 pixels square. ↓ is a reference image, but for some reason the color is strange when you look at the image after resizing with TensorBoard (it is normal when imshowing ...).
In Python it says (put the whole code behind):
img = cv2.imread(file_name[0])
img = cv2.resize(img, (FLAGS.image_size, FLAGS.image_size))
In the 1st layer convolution process, the convolution process is performed using 32 types of filters with 5 pixels square. This is the same as "[Explanation for beginners] TensorFlow Tutorial Deep MNIST". The activation function remains ReLU as well. For an explanation of convolution processing, please refer to the article "[Explanation for beginners] Introduction to convolution processing (explained in TensorFlow)". Only a part of the image is introduced in ↓. It is characterized by the convolution process. In the image on the far right, you can see that the nose and mouth have disappeared, showing the characteristics of the eyes. The filter doesn't really make sense.
In the 1st layer pooling process, the image of the 1st layer convolution result is max pooled to 1/3. Since the initial image size was as large as 81, I changed it from 1/2 to 1/3 compared to Deep MNIST. For more information on pooling, please refer to the article "[Explanation for beginners] Introduction to pooling (explained in TensorFlow)". If you look at the image, you can see that it is roughened by max pooling.
In the second layer convolution process, the convolution process is performed using 4 types of filters with 8 pixels square. I chose 8 pixels because I thought it was about the size of my nose. Also, I tried 4 types to narrow down the features (though the result didn't make sense). If you look at the image, you can see that it has failed at this point. Far from the features, most of them have disappeared ... Is the image you picked up bad?
In the 2nd layer pooling process, the image of the 2nd layer convolution result is max pooled to 1/3. If you look at the image, you can see that we are looking for features at the edges. Are you looking at your hairstyle?
The rest is a tightly coupled layer. This is unchanged from Deep MNIST, including Drop Apt.
The folder structure looks like this, and tab-delimited text files called test data test.txt and training data train.txt are placed in the "inputs" folder. This folder / file structure can be changed with run-time parameters.
inputs
│ test.txt
│ train.txt
The contents of the file are like the image, the first column is the file, the second column is 0 or 1 (0 is Yuki Kashiwagi). The character code is S-JIS and the line feed code is CR-LF.
python fully_connected_feed.py
When executed with the above command, the progress and result will be output as shown in the figure below.
The following are provided as run-time parameters. Implemented using the library argparse, which is explained a little in the article "Specify parameters for face detection in openCV to quickly improve detection accuracy" doing.
Parameters | Contents | initial value | Remarks |
---|---|---|---|
learning_rate | Learning rate | 1e-4 | AdamOptimizerのLearning rate |
batch_size | Batch size | 20 | We will learn training data for each of these numbers. Although there is little training data, the initial value is small due to being dragged |
input_train_data | Training data | ./inputs/train.txt | この値を変えればTraining dataのフォルダ・ファイルを指定可能 |
input_test_data | test data | ./inputs/test.txt | この値を変えればtest dataのフォルダ・ファイルを指定可能 |
log_dir | Log storage directory | /tmp/tensorflow/kashiwagi/logs | Directory for storing learned parameters and TensorBoard logs |
image_size | Image size | 81 | resizeするときの初期Image size |
pool_size | Pooling size | 3 | マックスPooling size |
The entire process called Computational Graph in TensorBorad (["[Introduction] TensorFlow Basic Syntax and Concept"](http://qiita. com / FukuharaYohei / items / 0825c3518d8596c09396 # Computational-graph)) is output as shown in the figure below. This process flow is made by referring to the official tutorial TensorFlow Mechanics 101. The figure below shows the expansion of the main inference part.
Mnist.py of TensorFlow Mechanics 101 This is the learning model part created by referring to (/tutorials/mnist/mnist.py).
import tensorflow as tf
#Image tag to be output to TensorBoard
IMAGE_SOURCE = 'source'
IMAGE_FILTER = 'filter'
IMAGE_CONV = 'conv'
IMAGE_POOL = 'pool'
#Number of identification labels(This time Yuki Kashiwagi:0,Others: 1)
NUM_CLASSES = 2
NUM_OUTPUT_IMAGES = 64
NUM_FILTER1 = 32
NUM_FILTER2 = 4
SIZE_FILTER1 = 5
SIZE_FILTER2 = 8
NUM_FC = 1024
def inference(images, keep_prob, image_size, pool_size):
with tf.name_scope('inference'):
#Weight with standard deviation 0.Defined by a normal distribution random number of 1
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.1))
#Initial value of bias 0.Defined by 1 constant
def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape=shape))
#Convolution layer definition
def conv2d(x, W):
return tf.nn.conv2d(x, W, [1, 1, 1, 1], 'SAME')
#Pooling layer definition
def max_pool(x):
return tf.nn.max_pool(x, ksize=[1, pool_size, pool_size, 1], strides=[1, pool_size, pool_size, 1], padding='SAME')
#Input information
with tf.name_scope('input'):
tf.summary.image(IMAGE_SOURCE, images, NUM_OUTPUT_IMAGES, family=IMAGE_SOURCE)
#1st layer
with tf.name_scope('1st_layer'):
#1st convolution layer
with tf.name_scope('conv1_layer') as scope:
W_conv1 = weight_variable([SIZE_FILTER1, SIZE_FILTER1, 3, NUM_FILTER1])
b_conv1 = bias_variable([NUM_FILTER1])
h_conv1 = tf.nn.relu(conv2d(images, W_conv1) + b_conv1)
#Tensor[Vertical,side,3,Number of filters]From[Number of filters,Vertical,side,3]Permutation conversion and image output
tf.summary.image(IMAGE_FILTER, tf.transpose(W_conv1, perm=[3,0,1,2]), 4, family=IMAGE_FILTER)
#Tensor[-1,Vertical,side,Number of filters]From[-1,Number of filters,Vertical,side]And permutation conversion, merge the head 2D and output image
tf.summary.image(IMAGE_CONV, tf.reshape(tf.transpose(h_conv1, perm=[0,3,1,2]), [-1,image_size,image_size,1]), 4 , family=IMAGE_CONV)
#1st pooling layer
with tf.name_scope('pool1_layer') as scope:
#Image size calculation after pooling process
image_size1 = int(image_size / pool_size)
h_pool1 = max_pool(h_conv1)
#Tensor[-1,Vertical,side,Number of filters]From[-1,Number of filters,Vertical,side]And permutation conversion, merge the head 2D and output image
tf.summary.image(IMAGE_POOL, tf.reshape(tf.transpose(h_pool1,perm=[0,3,1,2]),[-1, image_size1, image_size1, 1]),
NUM_OUTPUT_IMAGES, family=IMAGE_POOL)
#2nd layer
with tf.name_scope('2nd_layer'):
#2nd convolution layer
with tf.name_scope('conv2_layer') as scope:
W_conv2 = weight_variable([SIZE_FILTER2, SIZE_FILTER2, NUM_FILTER1, NUM_FILTER2])
b_conv2 = bias_variable([NUM_FILTER2])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
#Tensor[Vertical,side,Number of filters 1,Number of filters 2]From[Number of filters 1*Number of filters 2,Vertical,side,1]And permutation and output iimage to TensorBoard
tf.summary.image(IMAGE_FILTER, tf.reshape(tf.transpose(W_conv2,perm=[2,3,0,1]),[-1,SIZE_FILTER2,SIZE_FILTER2,1]), 4, family=IMAGE_FILTER)
#Tensor[-1,Vertical,side,64]From[-1,64,Vertical,side]And permutation,[-1]When[64]Merge and output iimage to TensorBoard
tf.summary.image(IMAGE_CONV, tf.reshape(tf.transpose(h_conv2,perm=[0,3,1,2]),[-1,image_size1,image_size1,1]), 4, family=IMAGE_CONV)
#2nd pooling layer
with tf.name_scope('pool2_layer') as scope:
#Image size calculation after pooling process
image_size2 = int(image_size1 / pool_size)
h_pool2 = max_pool(h_conv2)
#Tensor[-1,Vertical,side,Number of filters 2]From[-1,Number of filters 2,Vertical,side]And permutation conversion, merge the head 2D and output the image to TensorBoard
tf.summary.image(IMAGE_POOL, tf.reshape(tf.transpose(h_pool2,perm=[0,3,1,2]),[-1,image_size2,image_size2,1]),
NUM_OUTPUT_IMAGES, family=IMAGE_POOL)
#Creation of fully connected layer 1
with tf.name_scope('fc1_layer') as scope:
W_fc1 = weight_variable([image_size2 ** 2 * NUM_FILTER2, NUM_FC])
b_fc1 = bias_variable([NUM_FC])
h_pool2_flat = tf.reshape(h_pool2, [-1, image_size2 ** 2 * NUM_FILTER2])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#Creation of fully connected layer 2(Read layer)
with tf.name_scope('fc2_layer') as scope:
W_fc2 = weight_variable([NUM_FC, NUM_CLASSES])
b_fc2 = bias_variable([NUM_CLASSES])
logits = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
return logits
def loss(logits, labels):
with tf.name_scope('loss'):
#Calculation of cross entropy
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
#Error rate value(cross_entropy)return it
return tf.reduce_mean(cross_entropy)
#error(loss)Train a learning model designed using error backpropagation based on
def training(loss, learning_rate):
with tf.name_scope('training'):
#Scalar output of error to TensorBoard
tf.summary.scalar('loss', loss)
#Optimized with Adam
train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
return train_op
#Calculate the number of correct answers for the prediction results produced by the learning model
def evaluation(logits, labels, batch_size):
with tf.name_scope('evaluation'):
#Calculation of the number of correct answers
correct = tf.reduce_sum(tf.cast(tf.nn.in_top_k(logits, labels, 1), tf.int32))
#Calculation of correct answer rate and Scalar output to TensorBoard
accuracy = correct / batch_size
tf.summary.scalar('accuracy', accuracy)
return correct, accuracy
Fully_connected_feed.py in TensorFlow Mechanics 101 It is a model call program at the time of learning made by referring to /examples/tutorials/mnist/fully_connected_feed.py). Since the training data is created for a small number, please change the code appropriately if there are many.
import argparse
import cv2
import os
import time
import numpy as np
import random
import sys
import tensorflow as tf
import model_deep
#Basic model parameters
FLAGS = None
#Answer all data
def do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
images_data, labels_data,
keep_prob):
true_count = 0 #Number of correct answers
#Calculate the total number
steps_per_epoch = len(images_data) // FLAGS.batch_size #Truncate division
num_examples = steps_per_epoch * FLAGS.batch_size # num_examples eventually subtract only the truncation
#Evaluation of all cases
for step in range(steps_per_epoch):
#Receive the number of correct answers and add
true_count += sess.run(eval_correct,
feed_dict={images_placeholder: images_data[step * FLAGS.batch_size: step * FLAGS.batch_size + FLAGS.batch_size],
labels_placeholder: labels_data[step * FLAGS.batch_size: step * FLAGS.batch_size + FLAGS.batch_size],
keep_prob: 1.0
})
#Correct answer rate calculation and display
print(' Num examples: %d Num correct: %d Precision @ 1: %0.04f' % (num_examples, true_count, (float(true_count) / num_examples)))
def run_training():
#Specify the scope to be output to the graph of TensorBoard
with tf.Graph().as_default():
#Placeholder definition
images_placeholder = tf.placeholder(tf.float32, name='images', shape=(FLAGS.batch_size, FLAGS.image_size, FLAGS.image_size, 3))
labels_placeholder = tf.placeholder(tf.int32, name='labels', shape=(FLAGS.batch_size) )
keep_prob = tf.placeholder(tf.float32, name='keep_probability' )
# inference()To create a model
logits = model_deep.inference(images_placeholder, keep_prob, FLAGS.image_size, FLAGS.pool_size)
# loss()To calculate the loss
loss_value = model_deep.loss(logits, labels_placeholder)
# training()To train and adjust the parameters of the learning model
train_op = model_deep.training(loss_value, FLAGS.learning_rate)
#Accuracy calculation
eval_correct, accuracy = model_deep.evaluation(logits, labels_placeholder, FLAGS.batch_size)
#Output the contents so far to TensorBoard
summary = tf.summary.merge_all()
#Ready to save
saver = tf.train.Saver()
#Creating a Session
with tf.Session() as sess:
#Preparing to write to TensorBoard
summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)
#Variable initialization
sess.run(tf.global_variables_initializer())
#Image data loop
for step in range(len(FLAGS.train_image) // FLAGS.batch_size):
#Start time evacuation
start_time = time.time()
# batch_Training for size images
train_batch = FLAGS.batch_size * step
#Training execution
feed_dict = {
images_placeholder: FLAGS.train_image[train_batch:train_batch + FLAGS.batch_size],
labels_placeholder: FLAGS.train_label[train_batch:train_batch+FLAGS.batch_size],
keep_prob: 0.5
}
# train_I'm throwing away op, but I'm not smart unless I specify it
_, loss_val, accuracy_val = sess.run([train_op, loss_value, accuracy] , feed_dict=feed_dict)
# (Per epoch)Processing time calculation
duration = time.time() - start_time
#Summary every 5 times(Instance of tensor board)Get and add to writer
if step % 5 == 0:
#Result output every 10 times
print('Step %d: loss = %.2f, accuracy = %.3f (%.4f sec)' % (step, loss_val, accuracy_val, duration))
#Execute session and get summary of TensorBoard
summary_str = sess.run(summary, feed_dict=feed_dict)
#Added summary to TensorBoard
summary_writer.add_summary(summary_str, step)
summary_writer.flush()
#Evaluated during the final loop
if (step + 1) == len(FLAGS.train_image)//FLAGS.batch_size:
saver.save(sess, os.path.join(FLAGS.log_dir, 'model.ckpt'), global_step=step)
print('Training Data Eval:')
#Training data evaluation
do_eval(sess, eval_correct, images_placeholder, labels_placeholder, FLAGS.train_image, FLAGS.train_label, keep_prob)
#Test data evaluation
print('Test Data Eval:')
do_eval(sess, eval_correct, images_placeholder, labels_placeholder, FLAGS.test_image, FLAGS.test_label, keep_prob)
#TensorBoard write close
summary_writer.close()
#Read image list files and convert individual image files and labels to TensorFlow format
def read_images(file_image_list):
#Array to put data
image_list = []
label_list = []
#Open file in read mode
with open(file_image_list) as file:
file_data = file.readlines()
#Random shuffle order
random.shuffle(file_data)
for line in file_data:
#Separated with spaces except line breaks
line = line.rstrip() #Remove trailing spaces
file_name = line.split('\t') #Delimiter string(tab)Separate as
#Read image data and FLAGS.image_size reduced to all sides
img = cv2.imread(file_name[0])
img = cv2.resize(img, (FLAGS.image_size, FLAGS.image_size))
# 0-Convert to a float value of 1
image_list.append(img.astype(np.float32)/255.0)
#Add to the end of the label array
label_list.append(int(file_name[1]))
#Convert to numpy format and return
return np.asarray(image_list), np.asarray(label_list)
#Main processing
def main(_):
#If the TensorBoard save directory exists, delete it and recreate it.
if tf.gfile.Exists(FLAGS.log_dir):
tf.gfile.DeleteRecursively(FLAGS.log_dir)
tf.gfile.MakeDirs(FLAGS.log_dir)
#Training and test data Image file reading
print('Start reading images')
FLAGS.train_image, FLAGS.train_label = read_images(FLAGS.input_train_data)
FLAGS.test_image, FLAGS.test_label = read_images(FLAGS.input_test_data)
#Start training
print('Start training')
run_training()
#Make it usable even when imported
parser = argparse.ArgumentParser()
#Input parameter definition
parser.add_argument(
'--learning_rate',
type=float,
default=1e-4,
help='Initial learning rate.'
)
parser.add_argument(
'--batch_size',
type=int,
default=20,
help='Batch size. Must divide evenly into the dataset sizes.'
)
parser.add_argument(
'--input_train_data',
type=str,
default='./inputs/train.txt',
help='File list data to put the input train data.'
)
parser.add_argument(
'--input_test_data',
type=str,
default='./inputs/test.txt',
help='File list data to put the input test data.'
)
parser.add_argument(
'--log_dir',
type=str,
default='/tmp/tensorflow/kashiwagi/logs',
help='Directory to put the log data.'
)
parser.add_argument(
'--image_size',
type=int,
default=81,
help='Input image size'
)
parser.add_argument(
'--pool_size',
type=int,
default=3,
help='MAX pooling size'
)
FLAGS, unparsed = parser.parse_known_args()
if __name__ == '__main__':
#main function start
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
Recommended Posts