[PYTHON] 100 amateur language processing knocks: 77

It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).

Chapter 8: Machine Learning

In this chapter, the task of classifying sentences into positive (positive) or negative (negative) using the sentence polarity dataset v1.0 of Movie Review Data published by Bo Pang and Lillian Lee (polarity analysis). Work on.

77. Measurement of correct answer rate

Create a program that receives the output of> 76 and calculates the correct answer rate of the prediction, the correct answer rate for the correct example, the recall rate, and the F1 score.

The finished code:

main.py


# coding: utf-8

fname_result = 'result.txt'


def score(fname):
	'''Score calculation from the result file
Read the result file and return the correct answer rate, precision rate, recall rate, F1 score

Return value:
Correct answer rate,Compliance rate,Recall,F1 score
	'''
	#Read the results and aggregate
	TP = 0		# True-Positive expectations+1, correct answer+1
	FP = 0		# False-Positive expectations+1, the correct answer is-1
	FN = 0		# False-Negative expectations-1, the correct answer is+1
	TN = 0		# True-Negative expectations-1, correct answer-1

	with open(fname) as data_file:
		for line in data_file:
			cols = line.split('\t')

			if len(cols) < 3:
				continue

			if cols[0] == '+1':			#Correct answer
				if cols[1] == '+1':		#Expected
					TP += 1
				else:
					FN += 1
			else:
				if cols[1] == '+1':
					FP += 1
				else:
					TN += 1

	#Calculation
	accuracy = (TP + TN) / (TP + FP + FN + TN)		#Correct answer rate
	precision = TP / (TP + FP)		#Compliance rate
	recall = TP / (TP + FN)		#Recall
	f1 = (2 * recall * precision) / (recall + precision) 	#F1 score

	return accuracy, precision, recall, f1


#Score calculation
accuracy, precision, recall, f1 = score(fname_result)
print('Correct answer rate\t{}\n Conformance rate\t{}\n recall\t{}\nF1 score\t{}'.format(
	accuracy, precision, recall, f1
))

Execution result:

Execution result


Correct answer rate 0.8660664040517726
Compliance rate 0.8675833490299492
Recall rate 0.8640030013130745
F1 score 0.8657894736842107

Correct answer rate

The accuracy rate is the percentage of all reviews that you can predict correctly.

Compliance rate for positive cases

The percentage of correct (positive) compliance is the percentage of reviews that you expect to be positive that are actually positive.

For example, when choosing a mushroom that can be eaten by going mushroom hunting (this is a positive example), it is difficult if you mistakenly select a poisonous mushroom (negative example). However, if you think that edible mushrooms are poisonous mushrooms and miss them, the amount of edible mushrooms will be reduced, and there is not much harm. In this way, it is okay to overlook some positive examples, so it is an important index when you want to select only positive examples. It's an indicator when you're strict about choosing the wrong thing and tolerant of missing the right thing.

As an extreme example, you can increase the precision rate by using a rough prediction logic that says that everything that is suspicious or unclear is negative.

Recall rate for positive cases

The recall rate for positive cases is the percentage of actual positive reviews that can be predicted to be positive.

For example, when a defective product is detected by a check before shipping the product (this is a normal example), if the defective product is overlooked and shipped, it will be a big fuss. If that is the case, there is no actual harm if the non-defective product is mistakenly judged as a defective product and excluded from the shipping target. If it is found to be a good product by re-inspection after that, it can be shipped. In this way, it is okay to select a negative example by making a slight mistake, so it is an important index when you want to avoid overlooking a positive example. It's tolerant of choosing the wrong one, but it's a tough indicator of missing the right one.

You can increase the recall rate by predicting that everything that is suspicious or unclear whether it is positive or not is positive.

F1 score for a positive example

The F1 score (also known as F-score, F-number, F-scale, etc.) is a quantification of the balance between precision and recall and is calculated by the following formula.

F1 score = 2 \ times \ frac {match rate \ times recall rate} {match rate + recall rate}

Calculation of F1 score requires calculation of precision and recall. At that time, counting the four numbers True Positive, False Positive, False Negative, and True Negative will make the calculation easier.

Actually 1 (positive) Actually 0 (negative)
Prediction is 1 (positive) True Positive False Positive
Prediction is 0 (negative) False Negative True Negative

Even in this code, each value is calculated after counting these four numbers.

In addition, I will visualize the relationship between the precision rate and recall rate and the F1 score in question 79.

Verification result with training data

Both were values around 0.86. That's about 86% prediction accuracy. Since the positive and negative data are half and half, it seems a little low considering that 50% is correct even if it is guessed ...

That's all for the 78th knock. If you have any mistakes, I would appreciate it if you could point them out.


Recommended Posts

100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 56
100 amateur language processing knocks: 24
100 amateur language processing knocks: 50
100 amateur language processing knocks: 59
100 amateur language processing knocks: 70
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 amateur language processing knocks: 92
100 amateur language processing knocks: 30
100 amateur language processing knocks: 06
100 amateur language processing knocks: 84
100 amateur language processing knocks: 81
100 amateur language processing knocks: 33
100 amateur language processing knocks: 46
100 amateur language processing knocks: 88
100 amateur language processing knocks: 89
100 amateur language processing knocks: 40
100 amateur language processing knocks: 45
100 amateur language processing knocks: 43
100 amateur language processing knocks: 55
100 amateur language processing knocks: 22
100 amateur language processing knocks: 61
100 amateur language processing knocks: 94
100 amateur language processing knocks: 54
100 amateur language processing knocks: 04
100 amateur language processing knocks: 63
100 amateur language processing knocks: 78
100 amateur language processing knocks: 12
100 amateur language processing knocks: 14
100 amateur language processing knocks: 08
100 amateur language processing knocks: 42
100 amateur language processing knocks: 19
100 amateur language processing knocks: 73
100 amateur language processing knocks: 75
100 amateur language processing knocks: 98
100 amateur language processing knocks: 83
100 amateur language processing knocks: 95
100 amateur language processing knocks: 32
100 amateur language processing knocks: 96
100 amateur language processing knocks: 87
100 amateur language processing knocks: 72
100 amateur language processing knocks: 79
100 amateur language processing knocks: 23
100 amateur language processing knocks: 05
100 amateur language processing knocks: 02
100 amateur language processing knocks: 37
100 amateur language processing knocks: 21
100 amateur language processing knocks: 68
100 amateur language processing knocks: 11
100 amateur language processing knocks: 90
100 amateur language processing knocks: 74
100 amateur language processing knocks: 66
100 amateur language processing knocks: 28
100 amateur language processing knocks: 64
100 amateur language processing knocks: 36
100 amateur language processing knocks: 77
100 amateur language processing knocks: 01
100 amateur language processing knocks: 16
100 amateur language processing knocks: 27