Introduction

In TensorFlow, there is a way to maintain reproducibility by fixing Seed as shown in the following article, but in the GPU environment of TensorFlow 2.3, fixing seed alone did not give the same result every time. (In Tensorflow 2.1 or 2.0, I think that the same result was obtained just by fixing the seed ...)

-[TensorFlow 2.x (tf.keras)] Fixed random number seed to improve reproducibility -Keras guarantees reproducibility (as of September 22, 2020) -Note on how to ensure the reproducibility of GPU calculation with tensorflow (tf-keras) --The same code in tensorflow, but the result is different. I want to do machine learning with reproducibility.

I tried running cross-validation several times and got the same result (I was able to confirm the reproducibility), so I will summarize it as a memorandum. The source code used for verification is published on Github, so please point out any mistakes or corrections.

Conclusion

First of all, I will write the conclusion, but as described in the following article, it seems that you should change the setting of tf.config.threading and the environment variables TF_DETERMINISTIC_OPS and TF_CUDNN_DETERMINISTIC.

Realizing reproducible Machine Learning - with Tensorflow

So, in TensorFlow 2.3, it seems that reproducibility can be maintained by creating a function that initializes random numbers as shown below and calling it at the beginning of the program.


def setSeed(seed):
    os.environ['PYTHONHASHSEED'] = '0'
    
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'

    tf.config.threading.set_inter_op_parallelism_threads(1)
    tf.config.threading.set_intra_op_parallelism_threads(1)

According to the Official Documentation, performance will be degraded at the expense of parallelism. I thought it would be okay to write only the conclusion, but I was wondering if it was true, so I performed cross-validation and actually confirmed whether the reproducibility was maintained.

Source code used for verification

I will summarize only the main points of the code placed in Github.

--main.py: Cross-validate (n = 10) the MNIST training image to find the loss and accuracy of 10 learnings. --The network configuration is as follows. MNIST image size 28 is entered in width and height.

    ```python
    def loadModel(width, height):

        layer_input = Input(shape=(width, height))
        layers_hidden = Reshape(target_shape=(width, height, 1))(layer_input)
        layers_hidden = Conv2D(filters=8, kernel_size=(3, 3), strides=1, activation="relu")(layers_hidden)
        layers_hidden = MaxPooling2D(pool_size=(2,2), strides=(2,2))(layers_hidden)
        layers_hidden = Conv2D(filters=8, kernel_size=(3, 3), strides=1, activation="relu")(layers_hidden)
        layers_hidden = MaxPooling2D(pool_size=(2,2), strides=(2,2))(layers_hidden)
        layers_hidden = Flatten()(layers_hidden) 
        layers_hidden = Dense(128, activation="relu")(layers_hidden)
        layers_hidden = Dropout(0.2)(layers_hidden)
        layer_output = Dense(10, activation="softmax")(layers_hidden)

        return Model(inputs=layer_input, outputs=layer_output)
    ```

--When I experimented with a short network where the hidden layer was only a fully connected layer (128 dimensions), there was no big difference in processing time, so I added a convolutional layer. --KFold of scikit-learn is used for cross-validation. ――I only wanted to see if it was reproducible, so I used only training images for training and evaluation.

--eval.py: Outputs the average variance and processing time from the saved text file. ――Since cross-validation is performed 10 times, if the variance of each cross-validation is 0, it is judged that the exact same result = reproducibility can be confirmed. --main.py will generate text files, so load them. --Since the text is saved as the first save time, the result of the first 10 cross-validation, the second save time, the second ... (see results), the time and result after reading each line Is put in a separate list to display the variance and processing time.

```python
def getResults(path_result, n_splits, num_train):
    with open(path_result, "r") as f:
        lines = f.readlines()

    results = []
    dates = []

    for line in lines:
        l = line.splitlines()[0].split()

        if len(l) != 1:
            results.append(l)
        else:
            dates.append(datetime.strptime(l[0], "%Y%m%d-%H%M%S"))

    results = np.array(results).astype(np.float64)
    results = np.reshape(results, newshape=(num_train, n_splits, 3))

    var = np.mean(np.var(results[:,:,1:], axis=0))
    diff_time_total = np.sum([dates[i + 1] - dates[i] for i in range(0, num_train-1)])

    print(f"{path_result}: {var:.4}, {diff_time_total}")
    return

getResults(path_result=os.path.join("results", "with_tf_option.txt"), n_splits=10, num_train=10)
```

-(For variance, both loss and accuracy have been averaged (np.mean), but it was better to separate them.) --__ Note) It is not the exact time of learning time because it is the total time until the result is saved in .txt and the next result is saved in .txt __

--___ run_all.sh (for Ubuntu) or run_all.ps1 (for Win) ___: This is a script that repeats main.py 10 times and finally executes eval.py. --Save the reproducible results in with_tf_option.txt and the non-reproducible results in without_tf_option.txt. --Since the result is added in main.py, the existing result is deleted when the shell is executed.

--docker-compose.yml, env / Dockerfile: This is a file for Docker. (For Linux) --For those who can use docker-compose, just `docker-compose up` will output the average and processing time of the variance of each cross-validation. --In order to prevent the generated file from becoming root, the container is executed as the current user. So please do export UID before execution

inspection result

This time, I ran it on Windows 10 TensorFlow 2.3 + Python 3.7.7. The average of the variances of each cross-validation (Variance) and the total time until the next result is saved (Time_diff_total) are as follows.

                                Variance, Time_diff_total
   results\with_tf_option.txt:     0.0,       0:17:11 　　　　#1037 seconds
results\without_tf_option.txt:  6.694e-07,    0:13:45        #825 seconds

With this option, the variance is 0, so you can think that the result of performing 10 cross-validations 10 times was exactly the same. However, if you try to ensure reproducibility, it takes about 104 seconds until the next result, and if you discard the reproducibility, it takes about 82 seconds until the next process. Since there is no time-consuming process other than training, it seems that performance will decrease if reproducibility is ensured according to the official document.

By the way, the loss and accuracy of the first and second times with and without options are as follows. (There is no need to post it twice because the variance is 0, but I will post the raw result just in case)

#With options(Number of times, loss, accuracy)
20201002-230020
0 0.041036274284124374 0.987500011920929
1 0.0490814633667469 0.9860000014305115
2 0.05664192885160446 0.9831666946411133
3 0.05320063605904579 0.9833333492279053
4 0.04623125120997429 0.9850000143051147
5 0.047372110188007355 0.984333336353302
6 0.05214701220393181 0.9850000143051147
7 0.03892550244927406 0.9858333468437195
8 0.047721363604068756 0.9851666688919067
9 0.05081837624311447 0.984499990940094
20201002-230216
0 0.041036274284124374 0.987500011920929
1 0.0490814633667469 0.9860000014305115
2 0.05664192885160446 0.9831666946411133
3 0.05320063605904579 0.9833333492279053
4 0.04623125120997429 0.9850000143051147
5 0.047372110188007355 0.984333336353302
6 0.05214701220393181 0.9850000143051147
7 0.03892550244927406 0.9858333468437195
8 0.047721363604068756 0.9851666688919067
9 0.05081837624311447 0.984499990940094

#No options
20201002-231902
0 0.039828840643167496 0.9888333082199097
1 0.049601536244153976 0.9866666793823242
2 0.05240510031580925 0.9850000143051147
3 0.05293559655547142 0.9850000143051147
4 0.04633906111121178 0.9850000143051147
5 0.04794950410723686 0.9850000143051147
6 0.053883280605077744 0.9838333129882812
7 0.03880513831973076 0.987333357334137
8 0.04899284988641739 0.9850000143051147
9 0.0499492883682251 0.9851666688919067
20201002-232034
0 0.04064466059207916 0.987666666507721
1 0.04839828982949257 0.9851666688919067
2 0.055755823850631714 0.984499990940094
3 0.05341317877173424 0.9829999804496765
4 0.046509750187397 0.9856666922569275
5 0.04652954265475273 0.9853333234786987
6 0.05223047733306885 0.984666645526886
7 0.039393164217472076 0.9869999885559082
8 0.048276182264089584 0.9854999780654907
9 0.04923468455672264 0.9848333597183228

Impressions

As a result of working on the higher version of TensorFlow, I was in trouble because I could not confirm the reproducibility, so I'm glad I could solve it just by adding environment variables and option settings. I experimented with a simple neural network for this model, but I think that a network with more layers will be used in the actual environment, so I am wondering how much the setting changed this time will affect the performance. (If you already have one, please let me know)

History

2020/10/03 Added explanation about the code released and written