When you're doing deep learning, you'll soon want to load not only the sample dataset, but the dataset you prepared yourself. I can manage to collect images, but what should I do after that? So, I left a note according to the procedure that seems to be easy to understand.
Use TensorFlow functions to read the file. Select "tf.read_file ()".
The usage is like this.
image_r = tf.read_file(fname)
fname is the name of the file you want to read. If Japanese is included, an error will occur unless it is UTF-8. (Maybe the character code can be changed ^^;)
Use TensorFlow functions to load images. For the time being, it supports PNG and JPEG.
In addition, there is also a convenient function that can read any of the above two types.
However, I can't read BMP or TIF.
The usage is like this.
image_r = tf.read_file(fname)
image = tf.image.decode_image(image_r, channels=3)
The flow is to first read the file and decode it according to the format. The read result is an array of (vertical, horizontal, channel). Specifically, in the case of a 2x2xRGB image, [[[R, G, B] # (x, y) = (0,0) upper left [R, G, B]] # (x, y) = (1,0) Upper right [[R, G, B] # (x, y) = (0,1) Bottom left [R, G, B]]] # (x, y) = (1,1) Bottom right It will be in the order of.
[Digression] Actually, GIF can also be read, but it seems that the result after decoding is for (frame, vertical, horizontal, channel) and animated GIF. I will tell you that I can not read GIF because the data is slightly different from the others (explosion)
When you actually read the dataset, you're probably crawling the directory or reading the definition file. For the time being, describe the procedure for reading the definition file.
It is assumed that the definition file is described as a text file (such as CSV). For example, like this.
c:\work\image\image1.png
c:\work\image\image2.png
c:\work\image\image3.png
Since it is deep learning, it will be like this if you add a label number.
c:\work\image0\image1.png, 0
c:\work\image0\image2.png, 0
c:\work\image1\image3.png, 1
To read this CSV, we also use TensorFlow functions. (Anything is convenient and convenient!)
First, prepare a queue to retrieve each line in order. This function is "tf.train.string_input_producer ()". Next, a class called "tf.TextLineReader" is prepared to read the text file line by line, and by specifying the queue earlier in this "read ()" function, the data for one line is actually prepared. Will be read. After that, it parses according to the CSV format, and this time it decomposes into file names and labels.
The code looks like this.
fname_queue = tf.train.string_input_producer([csvfile])
reader = tf.TextLineReader()
key, val = reader.read(fname_queue)
fname, label = tf.decode_csv(val, [["aa"], [1]])
csvfile is the CSV file name to read, key is the CSV file name and line number, val is the character string of that line, fname is the first column (file name), and label is the second column (label number). .. Image data will be read using this fname.
It's hard to understand the queue, but if you can create an image of reading the CSV line by line each time, instead of reading the CSV at once, I feel like I can grasp it. (It seems that when the key or value is accessed, it will be counted up to read the next line)
So, thanks to this CSV, it will be possible to handle the image file as it is, no matter what structure it is saved in. However, it is troublesome to create this CSV file, so I would like to write code to crawl the directory and create CSV.
Actually, the code so far does not actually read the file. The point of TensorFlow is that the graph is built first and then executed. In other words, this time as well, the part up to now was the graph construction part, and the execution part was not written.
Therefore, in order to actually execute it, the following code is required.
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
tf.train.start_queue_runners(sess)
x = sess.run(image)
Create a session, initialize it, start a queue, and run it. Please note that if you forget this "tf.train.start_queue_runners ()", the queue will not move and no file will be read. (Not only does it not move, but it also freezes and cannot be stopped ...) The final result x is the read image data (for all files).
Here's the code I actually tried, all together.
import sys
import tensorflow as tf
def read_csv(csvfile):
fname_queue = tf.train.string_input_producer([csvfile])
reader = tf.TextLineReader()
key, val = reader.read(fname_queue)
fname, label = tf.decode_csv(val, [["aa"], [1]])
return read_img(fname)
def read_img(fname):
img_r = tf.read_file(fname)
return tf.image.decode_image(img_r, channels=3)
def main():
argv = sys.argv
argc = len(argv)
if (argc < 2):
print('Usage: python %s csvfile' %argv[0])
quit()
image = read_csv(argv[1])
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
tf.train.start_queue_runners(sess)
x = sess.run(image)
print(x)
if __name__ == '__main__':
main()
The CSV file name is passed as an argument.
In order to process with Deep Learning, it is difficult to get all the files written in CSV, so I think that it is necessary to modify it so that it gets every few files. Also, it is a problem if they are brought in the order written in the CSV file, so it is necessary to include a mechanism to shuffle. (It seems to be in tf.train.string_input_producer ()) Furthermore, I feel like I want to inflate, rotate, or move the acquired image, or perform some processing to inflate the number of images.
I would like to investigate this area next.
Recommended Posts