[PYTHON] Select the required variables in TensorFlow and save / restore

When dividing the learning process in TensorFlow, the variable save / restore is required, which is supported by the tf.train.Saver class in TensorFlow. If the scale of the model is small, you can save / restore all the variables used, but if the model is large, you will want to save / restore only the variables that you really need.

In this article, we will confirm the save / restore method of variables using the handwritten digit classification MNIST as an example. (The environment is Python 2.7.11, tensorflow 0.8.0.)

Add trainable = True to the required variables

There are various possible situations regarding what is needed, depending on the content of the program. The easiest way is to save the entire variable used (tf.Variable class variable).


chkpt_file = '../MNIST_data/mnist_cnn.ckpt'

# Create the model
def inference(x, y_, keep_prob, phase_train):
(Omitted)
Building a network model, etc.
    
    return loss, accuracy, y_pred

if __name__ == '__main__':
(Omitted)
    loss, accuracy, y_pred = inference(x, y_, 
                                         keep_prob, phase_train)
    #                                    
    #Saver operation before entering Session(ops)Is defined with no arguments
    #
    saver = tf.train.Saver()

    with tf.Session() as sess:
        sess.run(init) 
               
        if restore_call:
            # Restore variables from disk.
            saver.restore(sess, chkpt_file) 

        if TASK == 'train':
            print('\n Training...')
            for i in range(5001):
            
(Learning process)
   
        # Save the variables to disk.Finally write to disc
        if TASK == 'train':
            save_path = saver.save(sess, chkpt_file)
            print("Model saved in file: %s" % save_path)

As mentioned above, you can define operations (ops) using "tf.train.Saver ()" with no arguments and save the entire tf.Variable with its save method. (Note. Those defined in tf.placeholder are not applicable.)

However, in most simple models of neural networks, the ** weight w ** and ** bias b ** of each unit are sufficient in most cases. The recommended method here is to add the trainable flag in the variable definition.

The following is a class definition of the convolution layer and a class example of the fully connected layer.

class Convolution2D(object):
    '''
      constructor's args:
          input     : input image (2D matrix)
          input_siz ; input image size
          in_ch     : number of incoming image channel
          out_ch    : number of outgoing image channel
          patch_siz : filter(patch) size
          weights   : (if input) (weights, bias)
    '''
    def __init__(self, input, input_siz, in_ch, out_ch, patch_siz, activation='relu'):
        self.input = input      
        self.rows = input_siz[0]
        self.cols = input_siz[1]
        self.in_ch = in_ch
        self.activation = activation
        
        wshape = [patch_siz[0], patch_siz[1], in_ch, out_ch]
        
        w_cv = tf.Variable(tf.truncated_normal(wshape, stddev=0.1), 
                            trainable=True)
        b_cv = tf.Variable(tf.constant(0.1, shape=[out_ch]), 
                            trainable=True)
        
        self.w = w_cv
        self.b = b_cv
        self.params = [self.w, self.b]
        
(Omitted)

# Full-connected Layer   
class FullConnected(object):
    def __init__(self, input, n_in, n_out):
        self.input = input
    
        w_h = tf.Variable(tf.truncated_normal([n_in,n_out],
                          mean=0.0, stddev=0.05), trainable=True)
        b_h = tf.Variable(tf.zeros([n_out]), trainable=True)
     
        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]
    
(Omitted)

When declaring variables (w_cv, b_cv) and (w_h, b_h) that correspond to weights and biases, ‘trainable = True’ (trainable) is added. With this effort, you will be able to collect only trainable variables later.

if __name__ == '__main__':
(Omitted)
    vars_to_train = tf.trainable_variables()
    
    if os.path.exists(chkpt_file) == False:
        restore_call = False
        init = tf.initialize_all_variables()

    else:
        restore_call = True
        vars_all = tf.all_variables()
        vars_to_init = list(set(vars_all) - set(vars_to_train))
        init = tf.initialize_variables(vars_to_init)
          
    saver = tf.train.Saver(vars_to_train)

    with tf.Session() as sess:
(Omitted)
    

The point in the above code is We have just collected the variables with trainable = True in ** tf.trainable_variables () ** and the entire variables declared in ** tf.all_variables () **. The image of the set of variables is as follows.

tf_vars_1.png

In the first process, only the variables with trainable are saved, and in the second and subsequent processes, these saved variables are restored and used. However, variables that are not saved / restored are initialized (even after the second time).

Here, compare the sizes of the saved files.

-rw-rw-r--1 52404005 May 31 09:54 mnist_cnn.all_vars
-rw-rw-r--1 13100491 May 22 09:15 mnist_cnn.trainable

This is just an example, but the saved file can be reduced to 1/4. (The above file has been renamed from mnist_cnn.ckpt.)

How to collect variables using namespaces

Next, I will introduce another method, a method of collecting variables using the namespace of variables and saving / restoring. In TensorFlow, in order to visualize Graph (model configuration), I think that Graph construction may proceed while defining a namespace, but this namespace can be used to collect the defined variables.

The following example is a method of collecting the variables used in batch normalization added to the convolution layer.

def batch_norm(x, n_out, phase_train):
    with tf.variable_scope('bn'):

(Batch normalization processing, various)

    return normed
    
#Create the model model construction part, batch above_norm()Is calling
def inference(x, y_, keep_prob, phase_train):
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    
    with tf.variable_scope('conv_1'):
        conv1 = Convolution2D(x, (28, 28), 1, 32, (5, 5), activation='none')
        conv1_bn = batch_norm(conv1.output(), 32, phase_train)
        conv1_out = tf.nn.relu(conv1_bn)
           
        pool1 = MaxPooling2D(conv1_out)
        pool1_out = pool1.output()
    
    with tf.variable_scope('conv_2'):
        conv2 = Convolution2D(pool1_out, (28, 28), 32, 64, (5, 5), 
                                                          activation='none')
        conv2_bn = batch_norm(conv2.output(), 64, phase_train)
        conv2_out = tf.nn.relu(conv2_bn)
           
        pool2 = MaxPooling2D(conv2_out)
        pool2_out = pool2.output()    
        pool2_flat = tf.reshape(pool2_out, [-1, 7*7*64])
    
(Other layers, omitted)
    
    return loss, accuracy, y_pred
 
#Main processing
if __name__ == '__main__':
    TASK = 'train'    # 'train' or 'test'
    
    # Variables
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])
    keep_prob = tf.placeholder(tf.float32)
    phase_train = tf.placeholder(tf.bool, name='phase_train')
    
    loss, accuracy, y_pred = inference(x, y_, 
                                         keep_prob, phase_train)

    # Train
    lr = 0.01
    train_step = tf.train.AdagradOptimizer(lr).minimize(loss)
    vars_to_train = tf.trainable_variables()    # option-1
    vars_for_bn1 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_1/bn')
    vars_for_bn2 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_2/bn')
    vars_to_train = list(set(vars_to_train).union(set(vars_for_bn1)))
    vars_to_train = list(set(vars_to_train).union(set(vars_for_bn2)))
    
    if TASK == 'train':
        restore_call = False
        init = tf.initialize_all_variables()
    elif TASK == 'test':
        restore_call = True
        vars_all = tf.all_variables()
        vars_to_init = list(set(vars_all) - set(vars_to_train))
        init = tf.initialize_variables(vars_to_init)  # option-1
        # init = tf.initialize_all_variables()    option-2
    else:
        print('Check task switch.')
          
    saver = tf.train.Saver(vars_to_train) 

    with tf.Session() as sess:
(Hereafter, the contents of the TensorFlow session)

Here, there are convolution layer 1 (conv_1) and convolution layer 2 (conv_2), and batch normalization is performed respectively. Variables used there are collected by ** tf.get_collection () ** as follows. ing.

vars_for_bn1 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_1/bn')
vars_for_bn2 = tf.get_collection(tf.GraphKeys.VARIABLES, scope='conv_2/bn')

In this example, inference () defines the namespaces of ** conv_1 ** and ** conv_2 **, and in that, batch_norm () with the namespace of ** bn ** is called. , It is the above (nested) namespace conv_1 / bn, conv_2 / bn.

After that, the set of variables is organized and divided into "things to be saved" and "things not to be saved (initialized after the next time)". (Since the code is difficult to read, I will attach a figure to help you understand it.)

tf_vars_2.png

The necessary variables are saved by passing the colored part in the above figure as vars_to_train totf.train.Saver (). When Batch Normalization was introduced "in a black box", a bug occurred because the necessary variables were not saved, but by taking the above method, the bug could be fixed.

Finally, check the file size again.

-rw-rw-r--1 52404005 May 31 09:54 mnist_cnn.all_vars
-rw-rw-r--1 13105573 May 31 10:05 mnist_cnn.ckpt
-rw-rw-r--1 13100491 May 22 09:15 mnist_cnn.trainable

The top is the case where all variables are stored, about 52 MB, the second is the case where the above trainable and namespace are used together, the case is about 13 MB, and the third is the case where only trainable is used ( The operation includes bugs), which is about 13 MB. We believe that reducing the file size is more effective in reducing Disk I / O time than in reducing Disk usage.

(I will upload the final code to Gist. Here it is.)

References (web site)

Recommended Posts

Select the required variables in TensorFlow and save / restore
12. Save the first column in col1.txt and the second column in col2.txt
Understand the TensorFlow namespace and master shared variables
Investigate the relationship between TensorFlow and Keras in transition
Clipping and normalization in TensorFlow
Set the environment variables required for PySide (Qt4) and PyQt (Qt5)
Search for variables in pandas.DataFrame and get the corresponding row.
Save the binary file in Python
[Python3] Save the mean and covariance matrix in json with pandas
Look up the names and data of free variables in function objects
Find it in the procession and edit it
Create a REST API using the model learned in Lobe and TensorFlow Serving.
Save the specified channel ID in text and load it at the next startup
Python variables and data types learned in chemoinformatics
[python] Difference between variables and self. Variables in class
About the difference between "==" and "is" in python
Checking methods and variables using the library see
Practice applying functions and global variables in Python
When the axis and label overlap in matplotlib
Loop variables at the same time in the template
Access the variables defined in the script from the REPL
Extract the lyrics information in the MP3 / MP4 file and save it in the lyrics file (* .lrc) for Sony walkman.
I want to replace the variables in the python template file and mass-produce it in another file.