This article is the 14th day article of Python Advent Calendar 2015.
A method that returns a random value can be tested, such as failing the test if the results deviate by a certain amount from the expected result. I wrote a program wondering if it could be tested statistically.
Unit test a method that returns a random value using a chi-square test.
For example, if you write a program that rolls dice and randomly returns 1 to 6, it will be as follows.
dice.py
# -*- coding: utf-8 -*-
import random
class Dice(object):
def throw(self):
return random.randint(1, 6)
The code to test this looks like this:
test_dice.py
# -*- coding: utf-8 -*-
import collections
import unittest
import dice
from scipy import stats
class TestDice(unittest.TestCase):
def setUp(self):
self.__target = dice.Dice()
def test_throw(self):
#Run 6000 times
result = [self.__target.throw() for n in range(0, 6000)]
#Aggregation of execution results
counted = collections.Counter(result)
#Check if there are any irregular eyes
self.assertItemsEqual([1, 2, 3, 4, 5, 6], counted.keys())
#Chi-square test, significance level 1%If the null hypothesis that there is no bias can be rejected, the test fails.
#About 1000 eyes should appear, and if there is a significant deviation from that, it will fail.
chi_square_value, p_value = stats.chisquare(
[counted[1], counted[2], counted[3], counted[4], counted[5], counted[6]],
f_exp=[1000, 1000, 1000, 1000, 1000, 1000]
)
self.assertLess(0.01, p_value)
The stats
module of scipy
is required, so if it is not installed, install it with pip
.
$ pip install numpy
$ pip install scipy
Execute it a specific number of times (6000 times this time) and aggregate the execution results. For example, the 1st eye is 1007 times, the 2nd eye is 1050 times, and so on.
Is it possible to test the difference between the total number of times and the theoretical value, and in this case, 1000 times for each dice, and reject the null hypothesis that "the number of times the dice are rolled is the same"? To find out.
If the null hypothesis can be rejected, it means that the result is not what you expected and the test will fail. On the contrary, if it cannot be rejected, the test is passed.
Even statistical tests will still fail if you are unlucky, so it may be difficult to actually operate.
Lowering the significance level reduces the number of patterns in which the test fails even though it is actually correct, but on the contrary, the test does not fail with a slight deviation, so the number of 1's was actually slightly higher than the others. May not be detected.
Conversely, increasing the significance level increases the pattern of test failures that are actually correct. However, even small deviations can be detected.
Tests with too many failures will not be looked at as usual, so I think it is better to do it with a low significance level when actually operating it.
Recommended Posts