[PYTHON] Try MXNet Tutorial (2): Symbol-Neural Net Graph and Automatic Differentiation

A memo that goes through MXNet Tutorial in order (do your best and proceed to the end ...)

--This time, the second Symbol --Neural network graphs and auto-differentiation.

Symbol

Since scientific calculations are possible with just the NDArray in the previous section, isn't this all the calculations done? May be raised.

MXNet provides a Symbol API that allows you to write symbolically. In symbolic writing, instead of writing calculations step by step, you first define a calculation graph. The graph contains an input / output Placeholder, which is compiled and then executed by giving a function that outputs an NDArray ... The Symbol API is used for Caffe's network settings and Theano's symbolic writing style. Similar.

Symbolic = declarative, almost synonymous with

Another advantage of the symbolic approach is optimization. If you write it imperatively, you do not know what you will need in each calculation. In symbolic writing, the output is predefined so you can reallocate memory in the middle and calculate instantly. Also, the memory requirement is smaller even on the same network.

Which way to write is discussed here

For now, this chapter describes the Symbol API See here for a graphical description (http: // localhost: 8888 /? token = ebb31e9b782c15683f581f6b99d7cfecc115e4774c59cf7c)

Building a basic Symbol

Basic calculation

How to express ʻa + b. First, create a placeholder with mx.sym.Variable(give a name when creating). Next, connect them with + to definec`, which is automatically named.

import mxnet as mx
a = mx.sym.Variable('a')  #Error when a is pulled out
b = mx.sym.Variable('b')
c = a + b
(a, b, c)  # _Plus0 and c are automatically named

OUT


    (<Symbol a>, <Symbol b>, <Symbol _plus0>)

Most of the NDArray operations are also applicable to Symble.

#Multiplication for each element
d = a * b
#Matrix product
e = mx.sym.dot(a, b)   
#Deformation
f = mx.sym.Reshape(d+e, shape=(1,4))  
#broadcast
g = mx.sym.broadcast_to(f, shape=(2,4))  
mx.viz.plot_network(symbol=g)  #Network visualization

output_4_0.png

Give input with bind and evaluate (details later)

Basic neural net

Also provided for neural network layers. Description example of a two-layer fully connected network.

#The output graph may be
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.sym.Activation(data=net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(data=net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(data=net, name='out')
mx.viz.plot_network(net, shape={'data':(100,200)})

output_7_0.png

Each symbol has a unique name. Both NDArray and Symbol represent a single tensor and the operators represent calculations between tensors. The operator takes Symbol (or NDArray) as input, and in some cases receives and outputs hyperparameters such as the number of hidden layers (hidden_num) and the type of activation function (ʻact_type`).

You can also see symbol as a function with several arguments, and you can see the list of arguments with this function call.

net.list_arguments()

OUT


    ['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias', 'out_label']
mx.sym.Variable('data')

OUT


    <Symbol data>

The following parameters and inputs are required for each Symbol

-- data: Data to be entered for the variable data --fc1_weight, fc1_bias: Weight and bias of first fully connected layer fc1 --fc2_weight, fc2_bias: Weight and bias of the first fully connected layer fc2 --ʻOut_label`: Label required for Loss

Can also be declared explicitly

net = mx.symbol.Variable('data')
w = mx.symbol.Variable('myweight')
net = mx.symbol.FullyConnected(data=net, weight=w, name='fc1', num_hidden=128)
net.list_arguments()

OUT


    ['data', 'myweight', 'fc1_bias']

In the above example, there are three inputs for data, weight, and bias to Fully Connected.

Is written, but there seems to be no bias in the code, and moreover, the abbreviation sym is not used ...

More complex construction

MXNet offers Symbols optimized for layers commonly used in deep learning. New operators can also be defined in Python

In the following example, Symbol is added for each element and then passed to the fully connected layer.

lhs = mx.symbol.Variable('data1')
rhs = mx.symbol.Variable('data2')
net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
"""Isn't this an ordinary operation?"""
net.list_arguments()

OUT


    ['data1', 'data2', 'fc1_weight', 'fc1_bias']

Not only unidirectional construction but also more flexible construction is possible

data = mx.symbol.Variable('data')
net1 = mx.symbol.FullyConnected(data=data, name='fc1', num_hidden=10)
print(net1.list_arguments())
net2 = mx.symbol.Variable('data2')
net2 = mx.symbol.FullyConnected(data=net2, name='fc2', num_hidden=10)
composed = net2(data2=net1, name='composed')  #use net as a function
print(composed.list_arguments())

OUT


    ['data', 'fc1_weight', 'fc1_bias']
    ['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias']

In this example, net2 is indexed as a function that takes an existing net1, resulting in projected having both net1 and net2 arguments.

symbol You can use Prefix if you want to add a common Prefix

data = mx.sym.Variable("data")
net = data
n_layer = 2
for i in range(n_layer):
    with mx.name.Prefix("layer%d_" % (i + 1)): #Prefix grant
        net = mx.sym.FullyConnected(data=net, name="fc", num_hidden=100)
net.list_arguments()

OUT


    ['data',
     'layer1_fc_weight',
     'layer1_fc_bias',
     'layer2_fc_weight',
     'layer2_fc_bias']

Modular construction of deep neural networks

It's hard to write deep networks like Google Inception one after another. Therefore, it is modularized and reused.

The following example first defines a fuctory function (convolution, generate a batch of Batch Normalize, ReLU)

# Output may vary
def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), name=None, suffix=''):
    conv = mx.symbol.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad, name='conv_%s%s' %(name, suffix))
    bn = mx.symbol.BatchNorm(data=conv, name='bn_%s%s' %(name, suffix))
    act = mx.symbol.Activation(data=bn, act_type='relu', name='relu_%s%s' %(name, suffix))
    return act
#Define one unit: convolution → batch norm (normalization for each batch) → activation with ReLU

prev = mx.symbol.Variable(name="Previos Output")
conv_comp = ConvFactory(data=prev, num_filter=64, kernel=(7,7), stride=(2, 2)) #Slide 7x7 filter with stride 2, no padding=>11 times 11
shape = {"Previos Output" : (128, 3, 28, 28)}
mx.viz.plot_network(symbol=conv_comp, shape=shape)

output_24_0.png

Use this to build Inception

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
def InceptionFactoryA(data, num_1x1, num_3x3red, num_3x3, num_d3x3red, num_d3x3, pool, proj, name):
    # 1x1
    c1x1 = ConvFactory(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_1x1' % name))
    # 3x3 reduce + 3x3
    c3x3r = ConvFactory(data=data, num_filter=num_3x3red, kernel=(1, 1), name=('%s_3x3' % name), suffix='_reduce')
    c3x3 = ConvFactory(data=c3x3r, num_filter=num_3x3, kernel=(3, 3), pad=(1, 1), name=('%s_3x3' % name))
    # double 3x3 reduce + double 3x3
    cd3x3r = ConvFactory(data=data, num_filter=num_d3x3red, kernel=(1, 1), name=('%s_double_3x3' % name), suffix='_reduce')
    cd3x3 = ConvFactory(data=cd3x3r, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_0' % name))
    cd3x3 = ConvFactory(data=cd3x3, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_1' % name))
    # pool + proj
    pooling = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
    cproj = ConvFactory(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_proj' %  name))
    # concat
    concat = mx.symbol.Concat(*[c1x1, c3x3, cd3x3, cproj], name='ch_concat_%s_chconcat' % name)
    return concat
prev = mx.symbol.Variable(name="Previos Output")
in3a = InceptionFactoryA(prev, 64, 64, 64, 64, 96, "avg", 32, name="in3a")
mx.viz.plot_network(symbol=in3a, shape=shape)

output_26_0.png

An example of what is complete is here

Grouping of multiple Symbols

When building a neural network with multiple Loss layers, grouping is possible with mxnet.sym.Group

net = mx.sym.Variable('data')
fc1 = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.sym.Activation(data=fc1, name='relu1', act_type="relu")
out1 = mx.sym.SoftmaxOutput(data=net, name='softmax')
out2 = mx.sym.LinearRegressionOutput(data=net, name='regression')
group = mx.sym.Group([out1, out2])
group.list_outputs()

OUT


    ['softmax_output', 'regression_output']

Relationship with NDArray

NDArray provides an imperative interface, calculations are evaluated statement by statement. Symbol is close to declarative programming, first declaring the computational structure and then evaluating the data. Close to regular expressions and SQL.

Advantages of NDArray

--Simple --Easy to utilize the features of programming languages such as for, if-else and libraries such as NumPy --Easy to debug step by step

Benefits of Symbol

--Almost all the functions of NDArray are provided (+, *, sin, reshape, etc.) --Easy to save, load and visualize --Easy to optimize calculation and memory usage

Operation of Symbol

The difference between Symbol and NDArray is as mentioned above, but Symbol can also be manipulated directly. However, keep in mind that it is mostly wrapped in the .module package.

Shape Inference Arguments, additional information, and output can be obtained for each Symbol. The output shape and symbol type can be estimated from the input shape and argument type, which makes memory allocation easier. ..

#It's easy to forget, but c= a + b

arg_name = c.list_arguments()  #Input name
out_name = c.list_outputs()    #Name of output
#Estimate the shape of the output from the input
arg_shape, out_shape, _ = c.infer_shape(a=(2,3), b=(2,3))
#Estimate output type from input
arg_type, out_type, _ = c.infer_type(a='float32', b='float32')
print({'input' : dict(zip(arg_name, arg_shape)),
 'output' : dict(zip(out_name, out_shape))})
print({'input' : dict(zip(arg_name, arg_type)),
 'output' : dict(zip(out_name, out_type))})

OUT


    {'output': {'_plus0_output': (2, 3)}, 'input': {'b': (2, 3), 'a': (2, 3)}}
    {'output': {'_plus0_output': <class 'numpy.float32'>}, 'input': {'b': <class 'numpy.float32'>, 'a': <class 'numpy.float32'>}}

Data binding and evaluation

You need to give data as an argument to evaluate symbolc. To do this, use the bind method. This is a method that returns an extruder when you pass a dictionary that maps the context and free valuable names to NDArray. From exeutor, evaluation can be executed by forward method, and the result can be fetched from ʻoutput` attribute.

ex = c.bind(ctx=mx.cpu(), args={'a' : mx.nd.ones([2,3]), 
                                'b' : mx.nd.ones([2,3])})

ex.forward()
print('number of outputs = %d\nthe first output = \n%s' % (
           len(ex.outputs), ex.outputs[0].asnumpy()))

OUT


    number of outputs = 1
    the first output = 
    [[ 2.  2.  2.]
     [ 2.  2.  2.]]

The same Symbol can be evaluated with different contexts (GPU) and different data

ex_gpu = c.bind(ctx=mx.gpu(), args={'a' : mx.nd.ones([3,4], mx.gpu())*2,
                                    'b' : mx.nd.ones([3,4], mx.gpu())*3})
ex_gpu.forward()
ex_gpu.outputs[0].asnumpy()

OUT


    array([[ 5.,  5.,  5.,  5.],
           [ 5.,  5.,  5.,  5.],
           [ 5.,  5.,  5.,  5.]], dtype=float32)

Evaluation by ʻevalis also possible, this is a bundle ofbind and forward`

ex = c.eval(ctx = mx.cpu(), a = mx.nd.ones([2,3]), b = mx.nd.ones([2,3]))
print('number of outputs = %d\nthe first output = \n%s' % (
            len(ex), ex[0].asnumpy()))

OUT


    number of outputs = 1
    the first output = 
    [[ 2.  2.  2.]
     [ 2.  2.  2.]]

Load and save

Like NDArray, it can be pickle and save and load. However, Symbol is a graph, and the graph consists of continuous calculations. Since these are implicitly represented by the output Symbol, serialize the output Symbol graph. Serializing with JSON improves readability, but use tojson for this.

print(c.tojson())
c.save('symbol-c.json')
c2 = mx.symbol.load('symbol-c.json')
c.tojson() == c2.tojson()

OUT


    {
      "nodes": [
        {
          "op": "null", 
          "name": "a", 
          "inputs": []
        }, 
        {
          "op": "null", 
          "name": "b", 
          "inputs": []
        }, 
        {
          "op": "elemwise_add", 
          "name": "_plus0", 
          "inputs": [[0, 0, 0], [1, 0, 0]]
        }
      ], 
      "arg_nodes": [0, 1], 
      "node_row_ptr": [0, 1, 2, 3], 
      "heads": [[2, 0, 0]], 
      "attrs": {"mxnet_version": ["int", 1000]}
    }

    True

Custom symbol

Operations like mx.sym.Convolution, mx.sym.Reshape are implemented in C ++ for performance. MXNet also allows you to create new arithmetic modules using languages like Python, see here for more information.

It feels like inheriting Softmax and implementing it in the foreground

Advanced usage

Type cast

Normally it is a 32-bit decimal point, but it is also possible to use a less accurate type for speeding up. Type conversion with mx.sym.Cast

a = mx.sym.Variable('data')
b = mx.sym.Cast(data=a, dtype='float16')
arg, out, _ = b.infer_type(data='float32')
print({'input':arg, 'output':out})

c = mx.sym.Cast(data=a, dtype='uint8')
arg, out, _ = c.infer_type(data='int32')
print({'input':arg, 'output':out})

OUT


    {'output': [<class 'numpy.float16'>], 'input': [<class 'numpy.float32'>]}
    {'output': [<class 'numpy.uint8'>], 'input': [<class 'numpy.int32'>]}

Variable sharing

Sharing between Symbols is possible by binding Symbols to the same array

a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
c = mx.sym.Variable('c')
d = a + b * c

data = mx.nd.ones((2,3))*2
ex = d.bind(ctx=mx.cpu(), args={'a':data, 'b':data, 'c':data})  #Share data as input value
ex.forward()
ex.outputs[0].asnumpy()

OUT


    array([[ 6.,  6.,  6.],
           [ 6.,  6.,  6.]], dtype=float32)

--Learning in Python 3 + Ubuntu, GPU environment ――I haven't translated the whole sentence into Japanese, just a memo. ――Since I just modified the output from Jupyter, the layout collapses ...

Have you decided how to call the module on the way? I got the impression that it has not been decided as a tutorial yet (called sym or symbol).

Next is the module plan.

Recommended Posts

Try MXNet Tutorial (2): Symbol-Neural Net Graph and Automatic Differentiation
[PyTorch Tutorial ②] Autograd: Automatic differentiation
Introduction to Thano Function definition and automatic differentiation