Anyway, the number of images is required to classify images accurately with deep learning. However, it is difficult to manually prepare and tag a large number of images. Therefore, the number of images is increased (inflated) by processing the tagged images.
This time, I'd like to learn from the TensorFlow code what to do to inflate.
Specifically, we will learn from the CIFAR-10 code. cifar10/cifar10_input.py
In the actual code, the image was inflated by combining multiple processes as shown below.
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
# Randomly flip the image horizontally.
distorted_image = tf.image.random_flip_left_right(distorted_image)
# Because these operations are not commutative, consider randomizing
# the order their operation.
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(distorted_image)
Looking at each one, there were five things in the CIFAR-10 code:
We will understand ** visually ** what each one is doing.
tf.random_crop(value, size, seed=None, name=None) The random_crop function is a function for randomly cropping an image with a given size. The image below is the result of actually cropping a 256x170 image with size = 100x100:
The trimming position changes depending on the value given to the seed. If the seed value is the same, the same image will be generated no matter how many times it is executed.
tf.image.random_flip_left_right(image, seed=None)
The random_flip_left_right function is a function for randomly flipping an image horizontally. The image below is the result of actually applying the random_flip_left_right function:
Since it is probabilistically inverted, it may not be inverted depending on the seed value.
There is also a function similar to random_flip_left_right called random_flip_up_down. While random_flip_left_right flips horizontally, random_flip_up_down flips vertically. ..
tf.image.random_brightness(image, max_delta, seed=None)
The random_brightness function is a function for adjusting the brightness of an image due to a random factor. The image below is the result of actually applying the random_brightness function:
The degree of brightness changes depending on the values of max_delta and seed.
tf.image.random_contrast(image, lower, upper, seed=None)
The random_contrast function is a function for adjusting the contrast of an image due to a random factor. The image below is the result of actually applying the random_contrast function:
You can see that Contrast1 has reduced contrast and Contrast2 has enhanced contrast. The lower and upper limits of strength can be adjusted with lower and upper.
tf.image.per_image_whitening(image)
The per_image_whitening function is a function for whitening an image so that the average is 0. The image below is the result of actually applying the per_image_whitening function:
Actually, each pixel value is calculated by (x --mean) / adjusted_stddev. mean is the average of all pixel values in the image, and adjusted_stddev is defined as adjusted_stddev = max (stddev, 1.0 / sqrt (image.NumElements ())). Where stddev is the standard deviation of all pixel values in the image.
There are other functions that could be used for padding, though not used in TensorFlow's CIFAR-10 example. I will introduce about 5 of them. tf.image.transpose_image(image) The transpose_image function is a function that transposes an image. The image below is the result of actually applying the transpose_image function:
Since it is only transposed, the result is the same no matter how many times it is executed. If you transpose the transposed image further, it will return to the original image.
tf.image.rot90(image, k=1) The rot90 function rotates the image counterclockwise every 90 degrees. The image below is the result of actually applying the rot90 function:
You can specify how many times to rotate by changing the value of k.
tf.image.random_hue(image, max_delta, seed=None) The random_hue function is a function for adjusting the hue of an RGB image due to a random factor. The image below is the result of actually applying the random_hue function:
max_delta must be in the range 0-0.5.
tf.image.random_saturation(image, lower, upper, seed=None) The random_saturation function is a function for adjusting the saturation of RGB images. The image below is the result of actually applying the random_saturation function:
Please refer to the following for scaling.
Recommended Posts