[PYTHON] Try using tf.metrics

Trigger

When I tried to use tf.metrics.accuracy, I was troubled by the fact that there were two return values (accuracy, update \ _op) and the values were not the normal correct answer rate. The same was true for tf.metrics.recall and tf.metrics.precision. It seems that there are almost no Japanese articles about this at the moment, so I made a note for the time being.

Behavior of tf.metrics

As the name suggests, it calculates various metrics including the correct answer rate.

However, if you only see the name,

# labels:1D tensor with correct label
# predictions:Predicted label one-dimensional tensor

accuracy, update_op = tf.metrics.accuracy(labels, predictions)
accuracy = tf.reduce_mean(tf.cast(predictions == labels, tf.float32))

You would expect these two accuracy to be the same value. Also, what do you think about update_op?

In conclusion, tf.metrics.accuracy behaves as if it holds all past values. (Actually, the total number of correct answers in the past and the number of data count are retained, and only "total ÷ number" is used).

That is, if you answered all the questions correctly in the first epoch and all the questions were wrong in the second epoch (and if the batch size of each epoch is always the same), the first accuracy is 1.00 and the second accuracy is 0.50. It becomes. If all the questions are answered correctly in the third epoch, the accuracy of the third time is about 0.67.

It seems that many people are confused about this behavior even if you look at tensorflow issues. Opinions such as "It's non-intuitive" and "I think tf.metrics.streaming \ _accuracy is a better name for this function".

By the way, one respondent said

It is not named "streaming \ _ ◯◯" because all tf.metrics are streaming anyway.
The reason we don't create non-streaming metrics is that they are easy to implement anyway.
1 I'm not interested in the percentage of correct answers in batch units, so I'm often not interested in non-streaming metrics.

And that. I see, I fell in love with it. It certainly seems convenient.

How to use tf.metrics

tf.metrics has two return values. accuracy and update \ _op.

Calling update \ _op will update the correct answer rate. accuracy holds the last calculated correct answer rate (initial value is 0).

In short, it looks like this.

import numpy as np
import tensorflow as tf

labels = tf.placeholder(tf.float32, [None])
predictions = tf.placeholder(tf.float32, [None])
accuracy, update_op = tf.metrics.accuracy(labels, predictions)

with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    print(sess.run(accuracy))  #Initial value 0

    #First time(All questions correct)
    sess.run(update_op, feed_dict={
        labels: np.array([1, 1, 1]),
        predictions: np.array([1, 1, 1])
    })
    print(sess.run(accuracy))  # 3 / 3 = 1

    #Second time(All questions wrong)
    sess.run(update_op, feed_dict={
        labels: np.array([0, 0, 0]),
        predictions: np.array([1, 1, 1])
    })
    print(sess.run(accuracy))  # 3 / 6 = 0.5

    #Third time(All questions correct)
    sess.run(update_op, feed_dict={
        labels: np.array([1, 1, 1]),
        predictions: np.array([1, 1, 1])
    })
    print(sess.run(accuracy))  # 6 / 9 =About 0.67

Implementation using tf.metrics

I don't know if this is good, but it looks like this, for example. Please let me know if there is another good way.

def create_metrics(labels, predictions, register_to_summary=True):
    update_op, metrics_op = {}, {}

    # accuracy, recall,Tf for precision calculation.Use metrics
    for key, func in zip(('accuracy', 'recall', 'precision'),
                         (tf.metrics.accuracy, tf.metrics.recall, tf.metrics.precision)):
        metrics_op[key], update_op[key] = func(labels, predictions, name=key)

    # f1_score is calculated by yourself
    metrics_op['f1_score'] = tf.divide(
        2 * metrics_op['precision'] * metrics_op['recall'],
        metrics_op['precision'] + metrics_op['recall'] + 1e-8,
        name='f1_score'
    )  # 1e-8 is a division by zero measure

    entire_update_op = tf.group(*update_op.values())

    if register_to_summary:  #Later tf.summary.merge_all()to be able to do
        for k, v in metrics_op.items():
            tf.summary.scalar(k, v)

    return metrics_op, entire_update_op

metrics_op, entire_update_op = create_metrics(labels, predictions)
merged = tf.summary.merge_all()

What I want to say and do is, in short

When using multiple metrics together, it is easier to put together each update \ _op in tf.group.
If you want to log with tf.summary, merge \ _all for each metrics later.
f1 \ _ score is not found in tf.metrics, so you may have to make your own.

about it.

Remarks

By the way, these metrics are local variables, not global variables.

local_init_op = tf.local_variables_initializer()
sess.run(local_init_op)

need to do it.