[PYTHON] TensorFlow Tutorial-Image Recognition (Translation)

TensorFlow Tutorial (Image Recognition) https://www.tensorflow.org/versions/master/tutorials/image_recognition It is a translation of. We look forward to pointing out any translation errors.


Our brain seems to make vision easy. One does not need to make an effort to distinguish between a lion and a jaguar, read a sign, and recognize a human face. But in reality, these are problems that are difficult to solve on a computer. It seems easy, just because our brain is very good at understanding images.

Over the last few years, the field of machine learning has made tremendous progress in addressing these difficult problems. In particular, a type of model called Deep Convolutional Neural Network has decent performance in difficult visual recognition tasks. It turns out that in the realm, we can achieve equal to or better than humans.

Researchers show steady progress in computer vision by validating their achievements with ImageNet (Academic Benchmark for Computer Vision). I did. QuocNet, [AlexNet](http://www.cs.toronto.edu/%7Efritz /absps/imagenet.pdf), Inception (GoogLeNet), BN-Inception-v2, etc. The models that appear one after another continue to show improvement and achieve cutting-edge results at each stage. Researchers inside and outside Google have published papers describing these models, but it's still difficult to reproduce these results. We are taking the next step by publishing the code that performs image recognition on our latest model, Inception-v3 (http://arxiv.org/abs/1512.00567).

Inception-v3 has been trained with data since 2012 for the ImageNet large-scale visual recognition challenge. This is a standard task in computer vision, where the model makes the entire image, like "zebra", "Dalmatian", "dishwasher" [1000 class](http://image-net.org/ Attempts to classify into challenges / LSVRC / 2014 / blowse-synsets). For example, AlexNet classifies some images as follows:

図

To compare the models, look at how often the model did not give the correct answer in the top five predicted (called the "top five error rate"). AlexNet achieved a top 5 error rate of 15.3% against the 2012 validation dataset, BN-Inception-v2 reached 6.66% and Inception-v3 reached 3.46%.

How well can a person perform the ImageNet challenge? [Blog article] by Andrej Karpathy who tried to measure his own performance (http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on- There is imagenet /). He had a top 5 error rate of 5.1%.

This tutorial will teach you how to use Inception-v3. Learn how to classify images into 1000 classes in Python or C ++. It also describes how to extract higher-level features from this model that can be reused for other vision tasks.

We are excited to imagine that the community is this model.

Usage in the Python API

The first time you run the classify_image.py program, the trained model will be downloaded from tensorflow.org. You need about 200M of available free space on your hard disk.

The following steps assume that you have installed TensorFlow from the PIP package and your terminal is in the TensorFlow root directory.

cd tensorflow/models/image/imagenet
python classify_image.py

The above command classifies the image of a given panda.

図

If the model runs correctly, the script will produce output similar to the following:

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
custard apple (score = 0.00149)
earthstar (score = 0.00127)

You can give other JPEG images by editing the --image_file argument.

If you download the model data to another directory, you must specify the directory to use for --model_dir.

Usage in the C ++ API

The same Inception-v3 model can be run in C ++ for production environments. You can download an archive containing GraphDef that defines such a model (by running from the root directory of the TensorFlow repository):

wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip -O tensorflow/examples/label_image/data/inception_dec_2015.zip

unzip tensorflow/examples/label_image/data/inception_dec_2015.zip -d tensorflow/examples/label_image/data/

Next, you need to compile a C ++ binary that contains the code that loads and executes the graph. If you are using TensorFlow on your platform's Install from Source (http://www.tensorflow.org/versions/master/get_started/os_setup.html#source), run the following command from your shell terminal: You should be able to run and build the sample:

bazel build tensorflow/examples/label_image/...

A binary executable should be created and you should be able to run it like this:

bazel-bin/tensorflow/examples/label_image/label_image

The default sample image included in the framework should be used and the output should look like this:

I tensorflow/examples/label_image/main.cc:200] military uniform (866): 0.647296
I tensorflow/examples/label_image/main.cc:200] suit (794): 0.0477196
I tensorflow/examples/label_image/main.cc:200] academic gown (896): 0.0232411
I tensorflow/examples/label_image/main.cc:200] bow tie (817): 0.0157356
I tensorflow/examples/label_image/main.cc:200] bolo tie (940): 0.0145024

Here, using the default Admiral Grace Hopper image, the network correctly identifies her in military uniform with a high score of 0.6. You can see that it is done.

図

Then try with your own image by giving the --image = argument, for example

bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png

tensorflow / examples / label_image / main.cc If you look at the contents of the file, how it looks You can see if it works. Let's take a closer look through the main function so that this code can help integrate TensorFlow into your own application:

Command line flags control the source of the file and the properties of the input image. The model takes a square 299x299 RGB image as input, so set them to the input_width and input_height flags. You also need to scale the pixel value from an integer between 0 and 255 to the floating point value at which the graph works. Control scaling with the input_mean and input_std flags: First subtract input_mean from each pixel value and then divide by input_std.

These values may seem a bit magical, but they are defined based on what the original modeler used as the training input image. If you use a graph that you have trained yourself, you will need to adjust the values to match what you used during the training process.

Within the ReadTensorFromImageFile () (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc#L88) function, you can see how to apply them to the image.

// Given an image file name, read in the data, try to decode it as an image,
// resize it to the requested size, and then scale the values as desired.
Status ReadTensorFromImageFile(string file_name, const int input_height,
                               const int input_width, const float input_mean,
                               const float input_std,
                               std::vector<Tensor>* out_tensors) {
  tensorflow::GraphDefBuilder b;

Start by generating a GraphDefBuilder. This is the object used to specify the model to run or load.

  string input_name = "file_reader";
  string output_name = "normalized";
  tensorflow::Node* file_reader =
      tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
                                b.opts().WithName(input_name));

Then generate a node for the small model. This node loads, resizes, and scales pixel values to get the results that the main model expects as input. The first node you create is a Const operation that just holds a tensor with the filename of the image you want to load. It is then passed as the first input for the ReadFile operation. Did you notice that we are passing b.opts () as the last argument of every operation creation function? This argument guarantees that the node will be added to the model definition held by GraphDefBuilder. Also, name the ReadFile operation by making a call to WithName () after b.opts (). This gives the node a name. If you don't give the node a name, it will be assigned a name automatically, so it's not strictly required, but it makes debugging a little easier.

  // Now try to figure out what kind of file it is and decode it.
  const int wanted_channels = 3;
  tensorflow::Node* image_reader;
  if (tensorflow::StringPiece(file_name).ends_with(".png ")) {
    image_reader = tensorflow::ops::DecodePng(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
  } else {
    // Assume if it's not a PNG then it must be a JPEG.
    image_reader = tensorflow::ops::DecodeJpeg(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
  }
  // Now cast the image data to float so we can do normal math on it.
  tensorflow::Node* float_caster = tensorflow::ops::Cast(
      image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
  // The convention for image ops in TensorFlow is that all images are expected
  // to be in batches, so that they're four-dimensional arrays with indices of
  // [batch, height, width, channel]. Because we only have a single image, we
  // have to add a batch dimension of 1 to the start with ExpandDims().
  tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
      float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
  // Bilinearly resize the image to fit the required dimensions.
  tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
      dims_expander, tensorflow::ops::Const({input_height, input_width},
                                            b.opts().WithName("size")),
      b.opts());
  // Subtract the mean and divide by the scale.
  tensorflow::ops::Div(
      tensorflow::ops::Sub(
          resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
      tensorflow::ops::Const({input_std}, b.opts()),
      b.opts().WithName(output_name));

Continue adding nodes. These nodes decode the file data into an image, cast an integer to a floating point number, resize it, and finally perform pixel value subtraction and division operations.

  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensor.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));

At the end of the previous section, you will get the model definition stored in the b variable. The ToGraphDef () function transforms this into a complete graph definition.

  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
  return Status::OK();

After that, create a Session object (an interface for actually executing the graph) and create it. Specify from which node you want to get the output and where you should put the output data, and execute it.

This returns a vector of Tensor objects. This is simply a single object we have known for a long time in this case. In this context, you can think of a Tensor as a multidimensional array. It holds an image with a height of 299 pixels, a width of 299 pixels, and a 3-channel image as a floating point value. If your product uses its own image processing framework, you should be able to use it instead before feeding the image to the main graph, as long as you apply the same transformations.

This is a simple example of dynamically creating a small TensorFlow graph in C ++, but it loads a much larger definition from the file to use the pre-trained Inception model. You can see how to do this with the LoadGraph () function.

// Reads a model graph definition from disk, and creates a session object you
// can use to run it.
Status LoadGraph(string graph_file_name,
                 std::unique_ptr<tensorflow::Session>* session) {
  tensorflow::GraphDef graph_def;
  Status load_graph_status =
      ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
  if (!load_graph_status.ok()) {
    return tensorflow::errors::NotFound("Failed to load compute graph at '",
                                        graph_file_name, "'");
  }

Looking at the image loading code, many of the terms should seem familiar. Instead of using GraphDefBuilder to generate the GraphDef object, load the protobuf file that contains GraphDef directly.

  session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));
  Status session_create_status = (*session)->Create(graph_def);
  if (!session_create_status.ok()) {
    return session_create_status;
  }
  return Status::OK();
}

It then creates a Session object from that GraphDef and passes it to the caller for later execution.

The GetTopLabels () function is largely similar to image loading, but it takes the result of running the main graph and converts it to a sorted list of labels with the highest scores. Like an image loader, it creates a GraphDefBuilder, adds two nodes to it, and runs a short graph to get a pair of output tensors. The output tensor pair represents the sorted score and the best index position.

// Analyzes the output of the Inception graph to retrieve the highest scores and
// their positions in the tensor, which correspond to categories.
Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
                    Tensor* indices, Tensor* scores) {
  tensorflow::GraphDefBuilder b;
  string output_name = "top_k";
  tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
                        how_many_labels, b.opts().WithName(output_name));
  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensors.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  // The TopK node returns two outputs, the scores and their original indices,
  // so we have to append :0 and :1 to specify them both.
  std::vector<Tensor> out_tensors;
  TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
                                  {}, &out_tensors));
  *scores = out_tensors[0];
  *indices = out_tensors[1];
  return Status::OK();

The PrintTopLabels () function takes these sort results and outputs them in an easy-to-read way. The CheckTopLabel () function is very similar, but for debugging, just make sure the top label is what you expected.

Finally, main () binds all these calls together.

int main(int argc, char* argv[]) {
  // We need to call this to set up global state for TensorFlow.
  tensorflow::port::InitMain(argv[0], &argc, &argv);
  Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
  if (!s.ok()) {
    LOG(ERROR) << "Error parsing command line flags: " << s.ToString();
    return -1;
  }

  // First we load and initialize the model.
  std::unique_ptr<tensorflow::Session> session;
  string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);
  Status load_graph_status = LoadGraph(graph_path, &session);
  if (!load_graph_status.ok()) {
    LOG(ERROR) << load_graph_status;
    return -1;
  }

Load the main graph.

  // Get the image from disk as a float array of numbers, resized and normalized
  // to the specifications the main graph expects.
  std::vector<Tensor> resized_tensors;
  string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);
  Status read_tensor_status = ReadTensorFromImageFile(
      image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,
      FLAGS_input_std, &resized_tensors);
  if (!read_tensor_status.ok()) {
    LOG(ERROR) << read_tensor_status;
    return -1;
  }
  const Tensor& resized_tensor = resized_tensors[0];

Load, resize, and process the input image.

  // Actually run the image through the model.
  std::vector<Tensor> outputs;
  Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}},
                                   {FLAGS_output_layer}, {}, &outputs);
  if (!run_status.ok()) {
    LOG(ERROR) << "Running model failed: " << run_status;
    return -1;
  }

Here we run the loaded graph, using an image as input.

  // This is for automated testing to make sure we get the expected result with
  // the default settings. We know that label 866 (military uniform) should be
  // the top label for the Admiral Hopper image.
  if (FLAGS_self_test) {
    bool expected_matches;
    Status check_status = CheckTopLabel(outputs, 866, &expected_matches);
    if (!check_status.ok()) {
      LOG(ERROR) << "Running check failed: " << check_status;
      return -1;
    }
    if (!expected_matches) {
      LOG(ERROR) << "Self-test failed!";
      return -1;
    }
  }

For testing purposes, you can check here to make sure you are getting the output you expect.

  // Do something interesting with the results we've generated.
  Status print_status = PrintTopLabels(outputs, FLAGS_labels);

Finally, it prints the label found.

  if (!print_status.ok()) {
    LOG(ERROR) << "Running print failed: " << print_status;
    return -1;
  }

The error handling here uses the TensorFlow Status object. This object is very useful because the ok () checker can tell you if an error has occurred and print an error message.

In this case we are demonstrating object recognition, but in various areas you should be able to use very similar code in other models you have found and trained. Hopefully this little example will give you some ideas on how to use TensorFlow for your own products.

Exercise: Transfer learning is the idea that if you know how to solve a task well, you should be able to transfer part of that understanding to solving related problems. One way to perform transfer learning is to remove the last classified layer of the network, the penultimate layer of the CNN, in this case 2048 dimensions. Is to extract the vector of. You can specify this by setting --output_layer = pool_3 and changing the handling of the output tensor in [C ++ API Example](#C ++ API Usage). Try extracting this feature from a collection of images and see that you can predict new categories that ImageNet does not have.

Resources to learn more

Michael Nielsen's Free Online Book (http://neuralnetworksanddeeplearning.com/chap1.html) is a great resource for learning neural networks in general. Especially for convolutional neural networks, there are some [great blog posts] by Chris Olah (http://colah.github.io/posts/2014-07-Conv-Nets-Modular/), and Michael Nielsen's book There is a Great Chapter that covers them.

To learn more about implementing a convolutional neural network, jump to TensorFlow's Deep Convolutional Network Tutorial (http://www.tensorflow.org/tutorials/deep_cnn/index.html) or a little looser ML Beginner and ML Expert You can get started with the MNIST Starter Tutorial in). Finally, if you want to get the latest information on research in this area, read the recent findings in the papers referenced in this tutorial.

Recommended Posts

TensorFlow Tutorial-Image Recognition (Translation)
TensorFlow Tutorial-Mandelbrot Set (Translation)
TensorFlow Tutorial-TensorFlow Mechanics 101 (Translation)
TensorFlow Tutorial-MNIST Data Download (Translation)
TensorFlow Tutorial-Sequence Transformation Model (Translation)
TensorFlow Tutorial-Convolutional Neural Network (Translation)
TensorFlow MNIST For ML Beginners Translation
TensorFlow Deep MNIST for Experts Translation
Momokuro member face recognition by TensorFlow (Part 2)
Try real-time object recognition using YOLOv2 (TensorFlow)