Understand the probabilities and statistics that can be used for progress management with a python program

Question 1

What is the total number of rolls when you roll the dice 100 times?

game_sugoroku_9570.png

Average / expected value

The average learned in elementary school is the top of the statistics. First, use the average to derive the answer to the first question and compare it with the simulation results.

program

import numpy as np
import numpy.random as rd
import matplotlib.pyplot as plt

#Define the dice
#Know the average
dice = [1, 2, 3, 4, 5, 6]
print("average:", np.mean(dice))

#Define the number of attempts
#You can see the expected value of the total in combination with the average
trialsNum = 100
print("Total expected value:", np.mean(dice) * trialsNum)
input("Press Enter to continue. . .")

#Actually try
#Draw a histogram to check the distribution of rolls
resultList = [rd.choice(dice) for i in range(trialsNum)]
plt.hist(resultList, bins=6, rwidth=0.8, range=(0.5, 6.5))
plt.show()

print("total:", np.sum(resultList))

Execution result (example)

The result is not constant because it uses random numbers.

Sample mean: 3.5
Total expected value: 350.0
Press Enter to continue. . .

dice1.png

total: 355

Commentary

The average number of rolls you can get when you roll the dice once is 3.5, so the total of 100 rolls is about 350, which is 100 times the average. You can manage the progress as follows by using the average / expected value.

  1. Obtain the average (= progress pace) of the amount of tasks digested per resource unit (person day, etc.) from past results
  2. Multiply the amount of resources remaining available by the due date by the pace of progress to find the expected amount of tasks that can be digested.
  3. Compare the expected value with the remaining ** task amount that must be digested by the due date to determine the progress **

But is it okay to leave the comparison and judgment to human experience and intuition? You can use the burndown chart to make judgments by looking at trends over time, but you still rely on experience and intuition.

Variance / standard deviation

About 350 ... You can answer the first question.

About 350 ... How much is "about 350"? The variance / standard deviation represents this. First, find the standard deviation from the sample, then find the standard deviation from the simulation results and compare them.

program

import numpy as np
import numpy.random as rd
import matplotlib.pyplot as plt

#Define the dice
#Know the mean and variance
dice = [1, 2, 3, 4, 5, 6]
print("Sample mean:", np.mean(dice))
print("Sample variance:", np.var(dice))

#Define the number of attempts
#You can see the expected value of the total in combination with the sample mean
#Can predict total standard deviation in combination with sample variance
trialsNum = 100
print("Total expected value:", np.mean(dice) * trialsNum)
print("Total standard deviation (expected):", np.sqrt(np.var(dice) * trialsNum))
input("Press Enter to continue. . .")

#Actually try...Try
metaTrialsNum = 10000
resultList = [np.sum([rd.choice(dice) for i in range(trialsNum)])
              for i in range(metaTrialsNum)]
myMean = np.mean(resultList)
myStd = np.std(resultList)
print("Average of total:", myMean)
print("Total standard deviation (actual):", myStd)

# 68–95–99.Check if the 7 rules apply
win = [len([n for n in resultList if myMean - r * myStd <= n and n <= myMean + r * myStd]) /
       metaTrialsNum for r in range(1, 4)]
print(
    f'μ±σ : {myMean - 1 * myStd :.1f} ~ {myMean + 1 * myStd:.1f}: {win[0]:.2%}')
print(
    f'μ±2σ: {myMean - 2 * myStd :.1f} ~ {myMean + 2 * myStd:.1f}: {win[1]:.2%}')
print(
    f'μ±3σ: {myMean - 3 * myStd :.1f} ~ {myMean + 3 * myStd:.1f}: {win[2]:.2%}')

#Draw a histogram to see the total distribution
plt.hist(resultList, bins=25)
plt.show()

Execution result (example)

The result is not constant because it still uses random numbers.

Sample mean: 3.5
Sample variance: 2.9166666666666665
Total expected value: 350.0
Total standard deviation (expected): 17.078251276599328
Press Enter to continue. . .

Figure_1.png

Average of total: 349.9814
Total standard deviation (actual): 17.034108548438923
μ±σ : 332.9 ~ 367.0: 69.69%
μ±2σ: 315.9 ~ 384.0: 95.77%
μ±3σ: 298.9 ~ 401.1: 99.76%

Commentary

The total distribution fits nicely into the 68–95–99.7 rule. It became a distribution. The standard deviation of about 17 shows how much it is "about 350". And the standard deviation can be obtained from the sample without simulation.

Error function erf

From the 68–95–99.7 rule, we found that if x in μ ± xσ is 1,2,3, the probability that the trial result is within that range is found. So don't you know the probability when x is 1.5? Or do you know the probability that the total number of rolls will be 370 or more? That's where the error function erf comes in. Let's illustrate how this function works in the python program below.

program

import math
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-4.0, 4.1, 0.1)

leg1 = "μ-xσ ≦ a ≦ μ+xσ"
y1 = [math.erf(i/math.sqrt(2)) for i in x]
p1 = plt.plot(x, y1)

leg2 = "a ≦ μ+xσ"
y2 = [0.5 + 0.5 * math.erf(i/math.sqrt(2)) for i in x]
p2 = plt.plot(x, y2)

leg3 = "μ+xσ ≦ a"
y3 = [0.5 - 0.5 * math.erf(i/math.sqrt(2)) for i in x]
p3 = plt.plot(x, y3)

plt.legend((p1[0], p2[0], p3[0]),
           (leg1, leg2, leg3), loc=0)
plt.grid(True)
plt.show()

Execution result

Figure_1.png

Commentary

The error function erf allows you to calculate the probability that the trial result is within μ ± xσ and the probability that it is less than or greater than μ + xσ for any x.

Or do you know the probability that the total number of rolls will be 370 or more?

It can be found using the error function erf First, find the value of x by applying the values to μ and σ in the following equation.

μ+xσ = 370

Total expected value: 350.0 Total standard deviation (estimated): 17.078251276599328

350+17x = 370 17x = 20 x = 1.18

You can then calculate the probability by applying a value to x in the expression used in your program.

0.5 - 0.5 * math.erf(i/math.sqrt(2)

0.5 - 0.5 * erf(1.18/√2) = 0.12 = 12%

When you roll the dice 100 times, there is a 12% chance that the total number of rolls will be 370 or higher. The mean, standard deviation, and error function allow us to answer the first question in many ways.

Question 2

How far will the teams that have progressed at the pace shown in the table below progress when iteration 20 is completed?

Iteration Velocity Accumulation
1 7 7
2 3 10
3 3 13
4 6 19
5 6 25

Average / expected value

μ = 200, isn't it?

Variance / standard deviation

Since the variance up to iteration 5 is 3.5, the standard deviation up to iteration 20 can be expected to be σ = √ (3.5 * 20) ≒ 8.4. The range of μ ± 3σ is approximately 175 to 225.

Error function erf

If you answer with an accuracy of about 80%, say 191 with μ-1σ (rounded down). If the progress goal up to iteration 20 is greater than that, you should negotiate to lower the goal to 191.

If you answer with 99% accuracy, say 175 with μ-3σ. If your goal is 225, there is no 1% chance that you will be in time. Even if the goal is 200, the probability of being in time is 50%, which is a fifty-fifty gambling. It's easy to get hurt when you judge that "If you keep going at the average pace, you'll be in time!"

Since both μ and σ fluctuate as the achievements of progress are accumulated, let's calculate the probability of progressing beyond the target with the error function erf at any time. Unlike dice, there is no perfect specimen.

The unit of resource amount is easy if it is the number of iterations, but if you want to make it more detailed, you can also calculate it by the number of days or man-days.

Recommended Posts

Understand the probabilities and statistics that can be used for progress management with a python program
I made a familiar function that can be used in statistics with Python
[Python] A program to find the number of apples and oranges that can be harvested
I created a template for a Python project that can be used universally
Mathematical optimization that can be used for free work with Python + PuLP
[Python] A program that finds the maximum number of toys that can be purchased with your money
I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python.
Python knowledge notes that can be used with AtCoder
Article that can be a human resource who understands and masters the mechanism of API (with Python code)
[Python] Make a graph that can be moved around with Plotly
A timer (ticker) that can be used in the field (can be used anywhere)
I made a shuffle that can be reset (reverted) with Python
Python standard module that can be used on the command line
[Python] Draw elevation data on a sphere with Plotly and draw a globe that can be rotated round and round
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
About the matter that torch summary can be really used when building a model with Pytorch
[For beginners] Baseball statistics and PyData that can be remembered in 33 minutes and 4 seconds ~ With Dai-Kang Yang
[Python] A program that creates stairs with #
[Python] A program that rounds the score
2. Make a decision tree from 0 with Python and understand it (2. Python program basics)
About the matter that the re.compiled object can be used for the re.match pattern
[Python] A program that finds a pair that can be divided by a specified value
[Python] A program that calculates the number of socks to be paired
Draw a graph that can be moved around with HoloViews and Bokeh
A memo when creating an environment that can be debugged with Lambda @ Edge for the time being
[Python] Code that can be written with brain death at the beginning when scraping as a beginner
File types that can be used with Go
Functions that can be used in for statements
A program that searches for the same image
A memo for making a figure that can be posted to a journal with matplotlib
A class for PYTHON that can be operated without being aware of LDAP
Easy program installer and automatic program updater that can be used in any language
How to install a Python library that can be used by pharmaceutical companies
[Python] A program that finds the minimum and maximum values without using methods
[Python] A program that calculates the number of updates of the highest and lowest records
Overview and useful features of scikit-learn that can also be used for deep learning
I made a Python program for Raspberry Pi that operates Omron's environmental sensor in the mode with data storage
Automate background removal for the latest portraits in a directory with Python and API
Workaround for the problem that UTF-8 Japanese mail cannot be sent with Flask-Mail (Python3)
Convert images from FlyCapture SDK to a form that can be used with openCV
[Python] A program that counts the number of valleys
[Python] Building an environment for competitive programming with Atom (input () can be used!) [Mac]
A program that asks for a few kilograms to reach BMI and standard weight [Python]
Japanese can be used with Python in Docker environment
Python program that looks for the same file name
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
A memo that I touched the Datastore with python
Until you can install blender and run it with python for the time being
In Python3.8 and later, the inverse mod can be calculated with the built-in function pow.
A note that runs an external program in Python and parses the resulting line
File sharing server made with Raspberry Pi that can be used for remote work
[Python] A program that compares the positions of kangaroos.
Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
I made a tool to automatically generate a state transition diagram that can be used for both web development and application development
Install Mecab and CaboCha on ubuntu16.04LTS so that it can be used from python3 series
[Django] Field names, user registration, and login methods that can be used in the User model
[Atcoder] [C ++] I made a test automation tool that can be used during the contest
How to set variables that can be used throughout the Django app-useful for templates, etc.-
Installation procedure for Python and Ansible with a specific version
What you can do with the Python standard library statistics
Goroutine (parallel control) that can be used in the field