[PYTHON] [Introduction to Tensorflow] Understand Tensorflow properly and try to make a model

This article is a translation of my blog, Data Science Struggle.

Purpose

Get a rough idea of what Tensorflow is and create a simple model.

Overview of Tensorflow

Let's take a brief look at what Tensorflow is, which I've come to hear in various places, and how it's positioned in machine learning.

What is Tensorflow? First of all, Tensorflow can be thought of as a calculator *** that automatically performs calculations to determine the parameters of the *** neural network when actually using it. The *** calculator *** here literally does the calculation, such as giving a 1 + 1 answer.
What is your position in machine learning? There are three main methods of using machine learning. The first is a pattern in which you create your own model in each language after understanding the algorithms and mathematics correctly. The second is a pattern that creates a model in a few lines using sklearn for python and each library according to the method for R. The third is a pattern that you create yourself using a calculation library suitable for creating machine learning models.

*** Tensorflow is the third one. *** ***

Here, I will briefly describe the actual situation regarding the above three. If you have actually learned the mathematics of machine learning, I would like to create a model once without depending on the library etc. for confirmation of the mathematics and mechanism and more flexible rewriting. However, it is often impractical to create all the models in that way for each and every situation, and reinventing the wheel is wasteful. Therefore, in reality, you will learn how to use the library for machine learning and deal with the problem. Most libraries are very useful, and you can achieve your goal in just a few lines if you just want to model. However, there may be situations where such a library is difficult to handle. Neural networks are one example. Not only the number of intermediate layers, but also the method of joining and data interrupts, anything can be done as long as specific rules are followed. It's difficult to use machine learning libraries to create such things, but there's so much to worry about to make them all yourself.

Computational libraries for machine learning, such as Tensorflow, solve the above situation. In the first place, it is not a machine learning library that creates a model in one line. *** A library that helps users build their own models ***. In other words, it is different from a machine learning library that can be used without knowing the mechanism at all, and it is difficult to use it without at least knowledge of how to build a model.

How to calculate Tensorflow

Let's take a look at how Tensorflow actually does the calculations. First, specifically, the following calculation is performed by Tensorflow.

5 + 3

If you solve this with Tensorflow, it becomes like this.

`add.py`


import tensorflow as tf
a = tf.constant(5)
b = tf.constant(3)
added = tf.add(a, b)

with tf.Session() as sess:
    print sess.run(added)

Here, if you try to print added normally,

Tensor("Add_1:0", shape=(), dtype=int32)

Is output. If interpreted properly, added is not a concrete number, but the form information of the calculation that is the sum of a and b, and the actual calculation is done when it is run in the Session. There will be. This *** form information *** is called *** graph *** etc. in Tensorflow.

Tensorflow variables

We saw a simple Tensorflow calculation method above, but there is one more thing to worry about. That is a variable. Since it is a calculation library used in machine learning, it is essential to handle multiple dimensions at once, and as mentioned above, after creating *** graph ***, perform actual calculation by run. It is convenient to be able to specify the value to be given to the variable later. In fact, Tensorflow has that feature. In summary, *** Tensorflow has multidimensional, constants, and temporary variables that accept assignments later ***. Variables can be defined as follows.

#constant
a = tf.constant(3)

#variable
b = tf.Variable(0)

#Placeholder
c = tf.placeholder(tf.float32)

Specifically, if you try to calculate using placeholders

a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
added = tf.add(a, b)
with tf.Session() as sess:
    print sess.run(added, feed_dict = {a: 3.0, b: 5.0})

In the above example, up to added is a graph, and a specific value is given by feed_dict in the run phase to perform the calculation.

Classification by Tensorflow

Let's actually classify iris data using tensorflow. From this point onward, the entire phase will be divided into two parts: *** blueprint creation *** and *** build ***. *** Blueprint creation *** is the part of what kind of model and graph to create. *** Build *** refers to defining the parameters in the blueprint by entering the data into the blueprint above. Below is a code example of the iris classification. In addition, in the data https://dl.dropboxusercontent.com/u/432512/20120210/data/iris.txt
Use the one downloaded from.

`get_data.py`


import pandas as pd
import tensorflow as tf
from sklearn import cross_validation

data = pd.read_csv('https://dl.dropboxusercontent.com/u/432512/20120210/data/iris.txt', sep = "\t")
data = data.ix[:,1:]
train_data, test_data, train_target, test_target = cross_validation.train_test_split(data.ix[:,['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']], data.ix[:,['Species']], test_size = 0.4, random_state = 0)

With the above code, the iris data is acquired and divided into training data and test data. In the future, we will create a model using the paired data of train_data and train_target. The answer when train_data is given is train_target. Of course, since it is divided into train and test, it should be done with test data, but this time it is omitted. From this, we will actually create a blueprint and build based on it.

`classify.py`


#Blueprint creation
#Placeholder settings
X = tf.placeholder(tf.float32, shape = [None, 4])
Y = tf.placeholder(tf.float32, shape = [None, 3])

#Parameter setting
W = tf.Variable(tf.random_normal([4, 3], stddev=0.35))

#Activation function
y_ = tf.nn.softmax(tf.matmul(X, W))

#Build
#Loss function
cross_entropy = -tf.reduce_sum(Y * tf.log(y_))

#Learning
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(cross_entropy)

#Run
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    for i in range(1000):
        x = train_data
        y = pd.get_dummies(train_target)
        print(sess.run(W))

        sess.run(train, feed_dict = {X: x, Y: y})

    test = sess.run(W)

Let's look at each one. First, create a blueprint. When the matrix of the explanatory variables is X, the matrix of the explained variables is Y, and the weight is W, this blueprint can be expressed as follows. The bias term is not used this time.

X = [None, 4]\\
W = [4, 3]\\
Y = [None, 3]\\
[None, 4] \times [4, 3] = [None, 3]\\

[None, 3] in the above equation matches Y. Actually move on to the code.

`placeholder.py`


#Placeholder settings
X = tf.placeholder(tf.float32, shape = [None, 4])
Y = tf.placeholder(tf.float32, shape = [None, 3])

Set a placeholder. X and Y set here point to the explanatory variable and the explained variable in the data, respectively, and the data will be entered in the part after ***. The iris data has four explanatory variables and three classes to be classified. In addition, since the number given when feeding data (the number of rows in the data frame) is unknown, put None there.

`parameter.py`


#Parameter setting
W = tf.Variable(tf.random_normal([4, 3], stddev=0.35))

Set the parameter part, that is, the weight. In the build part, this parameter W will be updated and confirmed by inputting data. In the tf.random_normal and stddev parts, random numbers that follow the specified distribution are given as initial values. There are several ways to give them, so you should look at each one.

`activate.py`


#Activation function
y_ = tf.nn.softmax(tf.matmul(X, W))

Determine the activation function. It is necessary to select an appropriate function according to the shape of the output, whether it is an intermediate layer or an output layer. In this case, this y_ is the final output for the input.

Beyond this is the build part. Tensorflow does the tricky calculations well, but you need to understand the composition of what you're doing. If you describe the composition of the build part appropriately, it means that *** a loss function is defined and parameters are updated so that the loss becomes smaller ***. The accuracy of the prediction depends on the value of the parameter W defined above. Change the parameters so that the prediction accuracy is as high as possible. At that time, instead of focusing on *** how accurate ***, we focused on *** how few mistakes *** and updated the parameters so that there were fewer mistakes. It means going.

`lost.py`


#Loss function
cross_entropy = -tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_, Y)

This loss function is the part of *** how few mistakes mentioned above ***. The loss function differs depending on whether the model to be created is classification or regression. This should be investigated because Tensorflow has multiple loss functions.

`train.py`


#Learning
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(cross_entropy)

The optimizer defines how to actually update the parameters based on the data. In train, the loss function is made small (parameter update) based on the update method specified by optimizer.

`execute.py`


#Run
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        x = train_data
        y = pd.get_dummies(train_target)
        print(sess.run(W))

        sess.run(train, feed_dict = {X: x, Y: y})

Make a concrete build here. This time, I skipped it and read all the data 1000 times as one unit. In the part of sess.run (), concrete data is given to the placeholder part by feed_dict. In this way, the parameter that locally minimizes the loss function is searched, and W is updated. In reality, it is often the case that the degree of loss is written down.