As a continuation of Blog Post, I would like to summarize a little more about Theano's details that I couldn't write on the slides.

First of all, Theano commentary is a concise expression of Theano features, so I recommend you to read it.

As it is written here, the feature of Theano is

Generate and compile C code at runtime
Support for running on GPU (CUDA required)
Automatic differentiation

And so on.

Theano's super-simplified tutorial

http://deeplearning.net/software/theano/tutorial/index.html#tutorial A rough summary of.

First, always import 3

import numpy
import theano
import theano.tensor as T

These three are promises.

All you need to know is

If you have a general understanding of the following, you should be able to read and understand the implementation of Deep Learning and make changes.

T.scalar, T.vector, T.matrix: Talk about variables (symbols)
The operation result of the symbol is a symbol
function: The story around the function
T.grad: Talk about differentiation
shared: shared variable
T.grad and shared variables and updates: typical gradient method implementation pattern

The story of variables (symbols)

Variables handled by Theano are handled by the concept of "tensor". The types and operations around the tensor are roughly defined under theano.tensor (as T).

I don't really understand tensors, but for the time being "** T. * Below are variable types and major general-purpose mathematical functions (exp, log, sin, etc.) **" about it.

Here, "** variable type **" is

Dimensional meaning such as "scalar, vector, matrix" (it seems to be called "rank" in tensor terminology)
Meaning of element type (integer, real number, etc.)

there is. These are combined to represent the variable type (tensor type).

As a name,

** scalar **: Scalar (floor = 0)
** vector **: Vector (rank = 1)
** matrix **: Matrix (rank = 2)

And

d: double
l: long

And so on (there are others).

Combining these,

Real number (float64) matrix → T.dmatrix ()
Integer (int64) vector → T.vector ()

And so on. For example:

x = T.lscalar("x")
m = T.dmatrix()

These x and m are "symbols" and do not have actual values. This is a little different from ordinary Python variables.

See below for more information on variable generation. http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-basic-tensor

The operation result of the symbol is a symbol

For example

x = T.lscalar("x")
y = x*2
z = T.exp(x)

And execute. Since x is a symbol that has no value, y also has no value, and y is also a symbol that means "x * 2". z is also a symbol for exp (x). (Actually a Python Object)

The calculations that make up a neural network are also treated as a mass of operations (in short, an expression) between these symbols until a value is actually given. Since it is treated as an expression, it is easy for humans to understand, and I think that it will be possible to perform automatic differentiation, which will be described later, and to optimize it at runtime.

function: around the function

In order to actually perform the calculation, it is necessary to define a "function".

For example, if you want to make "** f (x) = x * 2 **"

f = theano.function([x], x*2)

And

y = x*2
f = theano.function([x], y)

And so on. f becomes a function, and when called

>>> f(3)
array(6)

It will be.

function is

"List of argument symbols" as the first argument
Can also be specified as keyword ** inputs **
The second argument is "calculation formula (calculation between symbols)"
Can also be specified as the keyword ** outputs **

Is specified. It seems that the function is compiled at this point, and even complex functions are executed at high speed.

function has the keyword ** gives **. As the name implies, givens works like "replace a symbol in an expression with another symbol or value".

For example

>>> x = T.dscalar()
>>> y = T.dscalar()
>>> c = T.dscalar()
>>> ff = theano.function([c], x*2+y, givens=[(x, c*10), (y,5)])
>>> ff(2)
array(45)

You can say that. Originally, the value to be calculated is "x * 2 + y", but the argument of the function itself is supposed to take the symbol "c". Actually, it cannot be calculated unless x and y are given, but it can also be calculated by giving the values of x and y in this givens part. This will be used in future Tutorials to partially use data in machine learning.

T.grad: Around the derivative

One of Theano's main features is this differentiation function. You can "analyze the formula to find the differentiated formula" called automatic differentiation.

For example

x, y = T.dscalars ("x", "y") # * How to write collectively z = (x+2*y)**2

If we differentiate the equation with respect to x, we get dz / dx = 2 (x + 2 * y).

gx = T.grad(z, x)

You can convert the expression with.

For the derivative with respect to y, dz / dy = 4 (x + 2 * y), but that

gy = T.grad(z, y)

You can convert the expression with.

When actually finding the value, it still needs to be functionalized,

fgy = theano.function([x,y], gy)
>>> fgy(1,2)
array(20.0)

And so on.

shared: shared variable

Variable = theano.shared (object)

You can declare shared data that can be referenced in the above ** function ** in the form of. For example

>>> x = T.dscalar("x")
>>> b = theano.shared(numpy.array([1,2,3,4,5]))
>>> f = theano.function([x], b * x)
>>> f(2)
array([  2.,   4.,   6.,   8.,  10.])

You can use it with. To reference and set the value of a shared variable

>>> b.get_value()
array([1,2,3,4,5])
>>> b.set_value([10,11,12])

And so on. It is immediately reflected in the function defined earlier, and when you execute ** f (2) ** again, you can see that the result has changed.

>>> f(2)
array([ 20.,  22.,  24.])

T.grad and shared variables and updates: typical gradient method implementation pattern

function has a keyword argument called ** updates **, which allows you to update shared variables.

For example, to set c as a shared variable and increment it by 1 each time the function f is executed, write as follows.

c = theano.shared(0)
f = theano.function([], c, updates= {c: c+1})

The part ** updated = {c: c + 1} ** represents the update of the value that is familiar in the programming language ** c = c + 1 **. When you do this, you get:

>>> f()
array(0)
>>> f()
array(1)
>>> f()
array(2)

These can be used to implement the gradient method. For example, for the data ** x = [1,2,3,4,5] **, find ** c ** that minimizes ** y = sum ((xc) ^ 2) **. I want to. The code is as follows, for example.

x = T.dvector("x") # input
c = theano.shared(0.) #I will update this. The initial value is 0 for the time being.
y = T.sum((x-c)**2)  # y=Value you want to minimize
gc = T.grad(y, c) #Partial derivative of y with respect to c
d2 = theano.function([x], y, updates={c: c - 0.05*gc}) #Updates c every time it runs and returns the current y

As a result, if you give ** [1,2,3,4,5] ** to ** d2 () ** several times,

>>> d2([1,2,3,4,5])
array(55.0)
>>> c.get_value()
1.5
>>> d2([1,2,3,4,5])
array(21.25)
>>> c.get_value()
2.25
>>> d2([1,2,3,4,5])
array(12.8125)
>>> c.get_value()
2.625

It will be. You can see that y gradually decreases and c gradually approaches "3".

Implementation of logistic regression

If you can understand this area, you will understand how logistic regression in the following tutorial works.

http://deeplearning.net/software/theano/tutorial/examples.html#a-real-example-logistic-regression

(Well, do you know from the beginning? ^^;)

[PYTHON] Theano's basic notes

Theano's super-simplified tutorial

First, always import 3

All you need to know is

The story of variables (symbols)

The operation result of the symbol is a symbol

function: around the function

T.grad: Around the derivative

shared: shared variable

T.grad and shared variables and updates: typical gradient method implementation pattern

Implementation of logistic regression