[PYTHON] I touched TensorFlow's Tensorboard

The Earth Defense Force 4.1 that I bought casually is fun and Thunder! I'm going to say. Ants can come steadily, but don't just bite!

I wonder if it is popular to put out such a framework such as machine learning these days, Microsoft has put out DMLT for the same purpose. Tensorflow has taken the lead, but I don't know how many brews it has, and I feel like I can't even get it out, but I can't tell until I touch it. Machine learning systems are often unclear to the touch.

For the time being, it may be installed normally, but you should see the official page for that area. The official recommendation is virtualenv. Later I was fooled because I'm not Pythonista, but Tensorflow is currently only for 2.7. Support for the 3.x series is also mentioned as an issue, and it seems that support is being done, so I think it will come out soon.

http://tensorflow.org/get_started/os_setup.md

Tensorflow tutorial

I think Tensorflow became a hot topic because of its extensive tutorials. We officially provide general machine learning (Beginner) and learning methods using Convolutional Network.

The content is all in English, but if you look up math-related words, you won't have much trouble. For the time being, I will put the Beginner and Deep that I did with comments.

Beginner

# coding: utf-8
import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

import tensorflow as tf

#Placeholder. A variable for storing a value. The first number of 2D vectors is None because it is an arbitrary number.
#The second dimension is 784, which is the number of pixels of MNIST
x = tf.placeholder("float", [None, 784])

#W is Weight's W. The first dimension of the two-dimensional vector is the number of pixels because the order of the MNIST array is index → pixel, and it is a matrix.
#If this is not the first when multiplying, the result will not be a 10-column vector
W = tf.Variable(tf.zeros([784, 10]))
#This is a one-dimensional vector. It doesn't matter if it's a row or a column, it's just an index issue.
b = tf.Variable(tf.zeros([10]))

#Both W and b are considered to be Variable, but in the subsequent gradient descent method, the error back propagation method automatically automatically.
#It will be updated each time. If you don't use the backpropagation method, the variables aren't assigned, so
#The variable remains at its default value.

#matmul is matrix multiplication. The order is from x to W because of the shape of the matrix.
#Since one-dimensional vectors can be added as they are, the result and b are added as they are, and softmax is applied to the result.
#As a result.
#The result of softmax is a vector because it is the result of performing softmax for each element of the result vector.
y = tf.nn.softmax(tf.matmul(x, W) + b + a)

#The correct label for y. Each label is a vector of element 10.
y_ = tf.placeholder("float", [None, 10])

#The result of multiplying each element of y by the logarithmic vector and each line of the label = scalar is added for all lines, and the sign of the result is determined.
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

# #Graph optimization is performed by the Backpropagation Algorithm using the gradient descent method. Pretty black box
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

#Initialize all variables. Session is a Context-like thing for actually executing a TensorFlow program.
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

writer = tf.train.SummaryWriter("./log", sess.graph_def)

#Randomly acquire 100 items from the MNIST data set, and repeat learning for them.
#Where x and y_It seems that the reason why the value is entered only in is that the unknown = None is entered here and it cannot be operated unless this is confirmed.
#Placeholder for that.
#So here each[100, 784]When[100, 10]の行列が設定されるこWhenになる。
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)

    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

#The learning result y and the label y_Within each of the vectors of, the index that takes the maximum value is compared.
#1 stands on a label = that number, and at the same time it is always the maximum value.
#Compared to that, the highest value in the learning result = which value is judged in the image, so the indexes are the same.
#Is judging
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

# true/false to 1.0/0.Cast to 0 and average the added results
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

sess.run(accuracy, feed_dict={x:mnist.test.images, y_: mnist.test.labels}))

Deep

# coding: utf-8
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])

#Create a Weight Variable with the specified Shape.
def weight_variable(shape):
    #stddev is the standard deviation. truncated_normal is the specified average (default 0)
    #And, from the passed standard deviation (default 1), a value that is more than twice the standard deviation
    #Truncate to get again
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

#Create a compute node for two-dimensional convolution.
#Since it is a two-dimensional convolution, the input value is created by convolving the value in units of two dimensions = width and height.
# tf.nn.conv2d is a four-dimensional tensor of some form of input and filter.
#Patch the contents of the input in units of width and height specified by the filter,
#Filter each patch to reduce dimensions.
#For strides, the first and last elements must always be 1. 2nd and 3rd
#The element is actually used and specifies the moving interval of the window to be patched.
#Here, Stride is 1 for both, so it moves by 1 in the horizontal and vertical directions.
#So the input and output will be the same size.
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

#Pool the maximum value in the 2x2 range of input values. Perhaps.
#Since both the kernel size and the strides size are 2x2, only the largest one in the 2x2 range
#Image to leave. So if you put it here, the size of the image will be exactly halved.
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1],
                          strides=[1,2,2,1], padding='SAME')

#First layer.
# weight_of variable[5,5,1,32]That's because the first two dimensions are the size of the patch when it's folded.
#The third dimension is the number of input channels, and the fourth dimension is the number of output channels.
W_conv1 = weight_variable([5,5,1,32])
#The number of output channels is 32.
b_conv1 = bias_variable([32])

#Reshape x. As for how to reshape, it is made into this shape because a four-dimensional shape is passed.
#The first value is flattened, so all the values in x are expanded one-dimensionally.
#The 2D and 3D are the width and height of the image, respectively, and the 4th is the number of color channels.
#What kind of value is x? The flattened version of a 28x28 image is contained in a one-dimensional array.
#It is a two-dimensional array. Convert the 784 part to 28x28, and finally one element in it,
#It is in the form of.
x_image = tf.reshape(x, [-1,28,28,1])

# weight,x_image,Apply bias to the ReLU function to pool the results.
#When this process is performed, each patched data is halved in size.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#The second layer. The patch size remains the same, but the number of input and output channels is 32x and 2x, respectively.
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = weight_variable([64])

#Notice that the first argument of conv2d is the first layer pool. By doing this, the first layer
#From each value in the pool of convolution results, the corresponding input and output channels
#You can connect and patch the patched result again.
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# h_The data in pool2 has 28 widths and 28 heights, respectively./2/2 =It is 7.
#Since there are 64 output channels in the second layer, the one in which that amount is aligned is the first dimension.
#The second dimension is the number of neurons that calculate it.
W_fc1 = weight_variable([7 * 7 * 64, 1024])
#This will also receive the same number of neurons as Weight
b_fc1 = bias_variable([1024])

# h_At the stage of pool2[-1,7,7,1]Since it is arranged in the order of 64 pieces each
#Convert to a two-dimensional array that overlaps
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
#Now, multiply the two-dimensional matrices together, and as a result[-1, 1024]We have obtained a two-dimensional array of. This is virtually
#Is the result of.
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

#Create a dropout layer to reduce overfitting.
keep_prob = tf.placeholder("float")
#By default, the probability that each element is kept or independent
#Dropped out randomly.
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#From here on, the values used are different, but basically the same.
#Since it is Readout in terms of layers, the values actually used are extracted from the Convolutional network.
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

#From here, it's almost the same as the one-layer model. However, Optimizer is ADAM. What is ADAM
#I'm not sure.
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())

#The number has increased by 20000 times, but this means that about half of it will be dropped out on the way.
#Because I am assuming.
#When I run it, I'm not sure why, but I should be able to use 8 threads, but I only use 4 threads
#It was in a state. Maybe the number of CPUs (echo/proc/Because there are four in cpuinfo)?
# Core i7-4790 @ 3.Approximately 40 minutes at 60GHz(!)It takes about.
for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob: 1.0}))

writer = tf.train.SummaryWriter("./log", sess.graph_def)

Caution

I've added comments, but I've also included what is written in my understanding & documentation. So please forgive me for any mistakes.

Execution speed

Beginner and Deep (Convolutional version), although the number of loops is different, the speed difference is overwhelmingly faster in Beginner. I haven't done anything extra.

When I actually tried it on my machine (Core i7-4790 @ 3.60GHz, memory 16GiB), the result was as follows. By the way, the number of parallel CPUs in TensorFlow is probably determined by the number of CPU cores themselves, and it seems that Hyperthread-like things are not considered.

Beginner       :About 5 seconds
Deep(CPU Only) :40 minutes(!?)

It was a shocking result. I really wanted to try it with GPU Enabled, but the version of the CUDA SDK that goes into my Gentoo is not enough or too advanced, so it seems that I have to start by installing CUDA itself, so I will leave it for the time being. Perhaps the Deep side will not be practical speed unless it is done with GPU or in the distributed environment that Tensorflow talks about.

Precautions when using Tensorboard

It has this! I showed the scenery that I used crunchy in the video, but of course it can be used. I'm not the only one who mistakenly thought it was a desktop application.

There are two things to keep in mind when using Tensorboard, and if you just want to see the run-time graph first, you don't need something like the official one.

writer = tf.train.SummaryWriter("./log", sess.graph_def)

Just OK. This will create a file in ./log called events.out.tfevents. . every time it is executed. It can be done if the size is strangely large.

Another, Tensorboard itself is a type of application that starts a server from the command line and gets access, but when specifying that directory, it had to be ** full path **. I thought this was a trap ...

Tensorboard configuration

If you look at these things these days, why are you making them? I was worried, so I looked behind.

The library I'm using is

I heard dagre for the first time, but it seems to be a library for creating directed graphs.

And what surprised me most personally was that I remember seeing it many times at work.

    <link rel="import" href="external/polymer/polymer.html">

When I saw the line. ** Polymer! ** I accidentally rushed into it. That's Google ...

And the body of Tensorboard is even more surprising,

/// <reference path="../../../typings/tsd.d.ts" />

Isn't there a line that says ...! It was a combination of TypeScript + Polymer. Certainly, Polymer has a feeling of moving crisply even if you make something similar than React, so it may be suitable for such applications that are likely to be heavy.

I haven't seen the original source, but I think it was made by vulcanizing \ * .ts and \ * .html in terms of composition. Seeing that it is used on this scale, I think that Polymer is fine. However, Tensorboard only worked on Chrome, so please be careful about that.

Summary of the name of impression

This is the first time I've touched such a framework, but I think it's fairly intuitive. I think it's best to be able to calculate matrices and vectors, but I think it's pretty nice for people who have to program mathematical formulas to be able to apply addition, subtraction, multiplication, and division quite intuitively.

Personally, I was happy with the slimy adjustment of the Tensorboard. I'm afraid that if such a thing comes out in open source, the demand for screens will increase again.

Let's try a little more tutorial and see how much faster it will be if you actually use the GPU. If I have time, I'll try DMLT.

Recommended Posts

I touched TensorFlow's Tensorboard
I touched HaikuFinder
I touched Flask
I touched AWS Chalice
I touched the Qiita API
I touched Wagtail (2). Introducing django-extensions.
I touched Tensorflow and keras
I touched PyAutoIt for a moment
I touched something called Touch Designer
I touched "Orator" so I made a note
I touched the data preparation tool Paxata