[PYTHON] Select the required variables in TensorFlow and save / restore

When dividing the learning process in TensorFlow, the variable save / restore is required, which is supported by the tf.train.Saver class in TensorFlow. If the scale of the model is small, you can save / restore all the variables used, but if the model is large, you will want to save / restore only the variables that you really need.

In this article, we will confirm the save / restore method of variables using the handwritten digit classification MNIST as an example. (The environment is Python 2.7.11, tensorflow 0.8.0.)

Add trainable = True to the required variables

There are various possible situations regarding what is needed, depending on the content of the program. The easiest way is to save the entire variable used (tf.Variable class variable).


chkpt_file = '../MNIST_data/mnist_cnn.ckpt'

# Create the model
def inference(x, y_, keep_prob, phase_train):
(Omitted)
Building a network model, etc.
    
    return loss, accuracy, y_pred

if __name__ == '__main__':
(Omitted)
    loss, accuracy, y_pred = inference(x, y_, 
                                         keep_prob, phase_train)
    #                                    
    #Saver operation before entering Session(ops)Is defined with no arguments
    #
    saver = tf.train.Saver()

    with tf.Session() as sess:
        sess.run(init) 
               
        if restore_call:
            # Restore variables from disk.
            saver.restore(sess, chkpt_file) 

        if TASK == 'train':
            print('\n Training...')
            for i in range(5001):
            
(Learning process)
   
        # Save the variables to disk.Finally write to disc
        if TASK == 'train':
            save_path = saver.save(sess, chkpt_file)
            print("Model saved in file: %s" % save_path)

As mentioned above, you can define operations (ops) using "tf.train.Saver ()" with no arguments and save the entire tf.Variable with its save method. (Note. Those defined in tf.placeholder are not applicable.)

However, in most simple models of neural networks, the ** weight w ** and ** bias b ** of each unit are sufficient in most cases. The recommended method here is to add the trainable flag in the variable definition.

The following is a class definition of the convolution layer and a class example of the fully connected layer.

class Convolution2D(object):
    '''
      constructor's args:
          input     : input image (2D matrix)
          input_siz ; input image size
          in_ch     : number of incoming image channel
          out_ch    : number of outgoing image channel
          patch_siz : filter(patch) size
          weights   : (if input) (weights, bias)
    '''
    def __init__(self, input, input_siz, in_ch, out_ch, patch_siz, activation='relu'):
        self.input = input      
        self.rows = input_siz[0]
        self.cols = input_siz[1]
        self.in_ch = in_ch
        self.activation = activation
        
        wshape = [patch_siz[0], patch_siz[1], in_ch, out_ch]
        
        w_cv = tf.Variable(tf.truncated_normal(wshape, stddev=0.1), 
                            trainable=True)
        b_cv = tf.Variable(tf.constant(0.1, shape=[out_ch]), 
                            trainable=True)
        
        self.w = w_cv
        self.b = b_cv
        self.params = [self.w, self.b]
        
(Omitted)

# Full-connected Layer   
class FullConnected(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = tf.Variable(tf.truncated_normal([n_in,n_out],
                          mean=0.0, stddev=0.05), trainable=True)
        b_h = tf.Variable(tf.zeros([n_out]), trainable=True)
     
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
(Omitted)

When declaring variables (w_cv, b_cv) and (w_h, b_h) that correspond to weights and biases, ‘trainable = True’ (trainable) is added. With this effort, you will be able to collect only trainable variables later.

if __name__ == '__main__':
(Omitted)
    vars_to_train = tf.trainable_variables()
    
    if os.path.exists(chkpt_file) == False:
        restore_call = False
        init = tf.initialize_all_variables()

    else:
        restore_call = True
        vars_all = tf.all_variables()
        vars_to_init = list(set(vars_all) - set(vars_to_train))
        init = tf.initialize_variables(vars_to_init)
          
    saver = tf.train.Saver(vars_to_train)

    with tf.Session() as sess:
(Omitted)

The point in the above code is We have just collected the variables with trainable = True in ** tf.trainable_variables () ** and the entire variables declared in ** tf.all_variables () **. The image of the set of variables is as follows.

In the first process, only the variables with trainable are saved, and in the second and subsequent processes, these saved variables are restored and used. However, variables that are not saved / restored are initialized (even after the second time).

Here, compare the sizes of the saved files.

-rw-rw-r--1 52404005 May 31 09:54 mnist_cnn.all_vars
-rw-rw-r--1 13100491 May 22 09:15 mnist_cnn.trainable

This is just an example, but the saved file can be reduced to 1/4. (The above file has been renamed from mnist_cnn.ckpt.)

How to collect variables using namespaces

Next, I will introduce another method, a method of collecting variables using the namespace of variables and saving / restoring. In TensorFlow, in order to visualize Graph (model configuration), I think that Graph construction may proceed while defining a namespace, but this namespace can be used to collect the defined variables.

The following example is a method of collecting the variables used in batch normalization added to the convolution layer.

def batch_norm(x, n_out, phase_train):
    with tf.variable_scope('bn'):

(Batch normalization processing, various)

    return normed
    
#Create the model model construction part, batch above_norm()Is calling
def inference(x, y_, keep_prob, phase_train):
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    
    with tf.variable_scope('conv_1'):
        conv1 = Convolution2D(x, (28, 28), 1, 32, (5, 5), activation='none')
        conv1_bn = batch_norm(conv1.output(), 32, phase_train)
        conv1_out = tf.nn.relu(conv1_bn)
           
        pool1 = MaxPooling2D(conv1_out)
        pool1_out = pool1.output()
    
    with tf.variable_scope('conv_2'):
        conv2 = Convolution2D(pool1_out, (28, 28), 32, 64, (5, 5), 
                                                          activation='none')
        conv2_bn = batch_norm(conv2.output(), 64, phase_train)
        conv2_out = tf.nn.relu(conv2_bn)
           
        pool2 = MaxPooling2D(conv2_out)
        pool2_out = pool2.output()    
        pool2_flat = tf.reshape(pool2_out, [-1, 7*7*64])
    
(Other layers, omitted)
    
    return loss, accuracy, y_pred
 
#Main processing
if __name__ == '__main__':
    TASK = 'train'    # 'train' or 'test'
    
    # Variables
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])
    keep_prob = tf.placeholder(tf.float32)
    phase_train = tf.placeholder(tf.bool, name='phase_train')
    
    loss, accuracy, y_pred = inference(x, y_, 
                                         keep_prob, phase_train)

    # Train
    lr = 0.01
    train_step = tf.train.AdagradOptimizer(lr).minimize(loss)
    vars_to_train = tf.trainable_variables()    # option-1
    vars_for_bn1 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_1/bn')
    vars_for_bn2 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_2/bn')
    vars_to_train = list(set(vars_to_train).union(set(vars_for_bn1)))
    vars_to_train = list(set(vars_to_train).union(set(vars_for_bn2)))
    
    if TASK == 'train':
        restore_call = False
        init = tf.initialize_all_variables()
    elif TASK == 'test':
        restore_call = True
        vars_all = tf.all_variables()
        vars_to_init = list(set(vars_all) - set(vars_to_train))
        init = tf.initialize_variables(vars_to_init)  # option-1
        # init = tf.initialize_all_variables()    option-2
    else:
        print('Check task switch.')
          
    saver = tf.train.Saver(vars_to_train) 

    with tf.Session() as sess:
(Hereafter, the contents of the TensorFlow session)

Here, there are convolution layer 1 (conv_1) and convolution layer 2 (conv_2), and batch normalization is performed respectively. Variables used there are collected by ** tf.get_collection () ** as follows. ing.

vars_for_bn1 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_1/bn')
vars_for_bn2 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_2/bn')

In this example, inference () defines the namespaces of ** conv_1 ** and ** conv_2 **, and in that, batch_norm () with the namespace of ** bn ** is called. ， It is the above (nested) namespace conv_1 / bn, conv_2 / bn.

After that, the set of variables is organized and divided into "things to be saved" and "things not to be saved (initialized after the next time)". (Since the code is difficult to read, I will attach a figure to help you understand it.)

The necessary variables are saved by passing the colored part in the above figure as vars_to_train totf.train.Saver (). When Batch Normalization was introduced "in a black box", a bug occurred because the necessary variables were not saved, but by taking the above method, the bug could be fixed.

Finally, check the file size again.

-rw-rw-r--1 52404005 May 31 09:54 mnist_cnn.all_vars
-rw-rw-r--1 13105573 May 31 10:05 mnist_cnn.ckpt
-rw-rw-r--1 13100491 May 22 09:15 mnist_cnn.trainable

The top is the case where all variables are stored, about 52 MB, the second is the case where the above trainable and namespace are used together, the case is about 13 MB, and the third is the case where only trainable is used ( The operation includes bugs), which is about 13 MB. We believe that reducing the file size is more effective in reducing Disk I / O time than in reducing Disk usage.

(I will upload the final code to Gist. Here it is.)

References (web site)

Tenforflow documentation, Variables: Creation, Initialization, Saving, and Loading https://www.tensorflow.org/versions/r0.8/how_tos/variables/index.html
How could I use Batch Normalization in TensorFlow? http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
Tensorflow get all variables in scope - Stack Overflow http://stackoverflow.com/questions/36533723/tensorflow-get-all-variables-in-scope