(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~

Thank you for browsing. It is pbird yellow.

This time, I estimated the number of AA appearances from 1 million hands. Specifically, we estimate how likely and how much AA will appear for an unknown 2000 hand that you will play in the future. The estimation procedure is as follows.

① Aggregate the hands and create a histogram ② Test whether the aggregated hands have normality ③ Find the average value and standard deviation value ④ Estimate the number of occurrences of AA with an accuracy of 95%

Regarding the above contents and technical terms that will appear after that, the following books are very easy to understand, so I will post them. ・ "Complete Self-study: Introduction to Statistics Kindle Edition" Hiroyuki Kojima (Author) https://amzn.to/3mSPpqf

Below is a histogram of the conclusions. In the histogram of this article, the vertical axis shows the integrated value and the horizontal axis shows the number of times AA appeared in 2000 hands.

スクリーンショット 2020-09-26 16.22.47.png

■ What is a histogram? A histogram is simply a "graph of aggregated data". Taking the above figure as an example ・ The number of times AA appeared every 2000 hands was 8 times (horizontal axis), but it was 72 times (vertical axis) in 1 million hands. ・ The number of times AA appeared every 2000 hands was twice (horizontal axis), but once in 1 million hands (vertical axis). And so on.

■ What is SW test? There are various test methods for normality, but this time we will use the SW test. The SW test is a method to verify whether the aggregated data has normality in the original data group (= population). With normality, various laws can be used. And it is possible to estimate the number of occurrences of AA using those laws.

The SW test uses the p-value to determine normality. The p value is a value that expresses the probability that data will be distributed like aggregated data when data is randomly selected from the population and distributed, assuming that the population has normality. In general, if the probability of distribution is less than 5%, it is too low and it is judged that the population has no rules with normality (= the population has no normality) in the first place. .. In the above figure, the p-value is 0.08> 0.05 (= 5%), so it can be said that there is marginal normality.

■ Mean and standard deviation In the above figure ・ Average → average (μ) ・ Deviation → standard deviation (σ) Is applicable.

■ Estimating method with 95% accuracy Let μ be the mean value of the aggregated data and σ be the standard deviation value. There is a 95% chance that the number of AA occurrences will fall within "μ-1.96σ ≤ x ≤ μ + 1.96σ".

So ** "If you play 2000 hands, AA will appear 3.37 times or more and 15.29 times or less with a 95% chance." ** It will be.

Since the estimation is based on the data of 1 million hands, there is some error in the above estimation value. As the data grows and the mean and standard deviation approaches the population mean and population standard deviation, the estimates will be accurate.

By the way, in the case of KK, it is as follows. Since P-value = 0.03 <0.05, the assumption is rejected and the population has no normality. In this case, the aggregated data also has no normality, so an estimated 95% accuracy cannot be calculated.

スクリーンショット 2020-09-26 17.05.46.png

However, if the number of hands increases, the P-value value will increase and the data will be normal.

In the case of QQ, there is normality ** "If you play 2000 hands, QQ will appear 3.40 times or more and 14.54 times or less with a 95% probability" **.

スクリーンショット 2020-09-26 17.05.26.png

By the way, why isn't KK regular? What's the difference with AA and QQ histograms! I think some people say that.

···I agree with you! !! !!

This is bad because the p-value is estimated around the boundary limit. This problem can be solved by changing every 2000 hands to every 1000 hands. However, this is a problem, and the values on the horizontal axis are only 0 or more, which makes it impossible to perform a proper analysis ...

スクリーンショット 2020-09-27 0.23.56.png

スクリーンショット 2020-09-27 6.57.38.png

The point is that 1 million hands is too few lol

However, it is really convenient because Python can do such complicated calculations in an instant. I will keep the source code, so please take advantage of it !!

As a matter of fact, the contents of the SW test are not very well understood. In Python, you can calculate with just one line, so even if you know what kind of numerical value you can get, the calculation process is difficult to understand. If you have any books that are explained with actual examples, I would be grateful if you could let me know!

The following is the source code. The program is a complete beginner, so if you have any suggestions on how to write better code, please let us know! !!

pokermain.py


from holdcards import Holdcards 
from plotgraph import Plotgraph
import os
import glob
import re

path='Write the path here'
hand = "AA"  #Describe the hand you want to look up
count = 2000 #Describe each hand you want to check


num = lambda val : int(re.sub("\\D", "", val))
filelist = sorted(glob.glob(os.path.join(path,"*.txt"),recursive=True),key = num)
totcards = []
graphdata = []
countdata = []
counthands = []
for item in filelist:
    print(item)
    with open(item) as f:
        data = f.readlines()
        card = Holdcards()
        h_cards = card.find_holdcards(data)
        totcards += h_cards

i = 0
while len(totcards[count*i:count*(i+1)]) == count:
    graphdata.append(totcards[count*i:count*(i+1)])
    i += 1

for item in graphdata:
    countdata.append(item.count(hand))

graph= Plotgraph()

graph.writehist(countdata,hand,count,len(graphdata)*count)  #SW test-Normalization

holdcards.py



class Holdcards:
       def __init__(self):
              self.trump={"A":"14","K":"13","Q":"12","J":"11","T":"10","9":"9","8":"8","7":"7","6":"6","5":"5","4":"4","3":"3","2":"2"}
              self.r_trump={"14":"A","13":"K","12":"Q","11":"J","10":"T","9":"9","8":"8","7":"7","6":"6","5":"5","4":"4","3":"3","2":"2"}
              self.hands = 0
              self.tothands = 0
              self.handlist = []


       def find_holdcards(self,data):
              holdcards = []
              for item in data:
                     if 'Dealt to' in item:
                            item = item[-7:-2]
                            if item[1] == item[4]:
                                   if int(self.trump.get(item[0])) > int(self.trump.get(item[3])):
                                          item = item[0] + item[3] + 's'
                                   else:
                                          item = item[3] + item[0] + 's'
                            else:
                                   if int(self.trump.get(item[0])) > int(self.trump.get(item[3])):
                                          item = item[0] + item[3] + 'o'
                                   elif item[0] == item[3]:
                                          item = item[0] + item[3]
                                   else:
                                          item = item[3] + item[0] + 'o'
                            
                            holdcards.append(item)
              return holdcards

plotgraph.py


import numpy as np
import pandas as pd
import scipy.stats as st
import math                        
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker 
import matplotlib.transforms as ts 

class Plotgraph:
       def __init__(self):
              pass

       def writehist(self,countdata,hand,count,tothands):#Mean mu, standard deviation sig, number of normal random numbers n

              df = pd.DataFrame( {'p1':countdata} )
              target = 'p1'  #Columns to plot in the data frame
              # (1)Statistical processing
              mu  = round(df[target].mean(),2)  #average
              sig = round(df[target].std(ddof=0),2)#Standard deviation: ddof(Degree of freedom)=0
              print(f'■ Average:{df[target].mean():.2f},standard deviation:{df[target].std(ddof=0):.2f}')
              ci1, ci2 = (None, None)

              #Graph drawing parameters
              x_min = round(mu - 3*sig)
              x_max = round(mu + 3*sig)  #Score range to plot (lower and upper limits)
              j = 10                  #Y-axis (frequency) step size
              k = 1                   #class
              bins = int((x_max - x_min)/k)            #Number of sections(x_max-x_min)/k  (100-40)/5->12
              d = 0.001

              #Drawing process from here
              plt.figure(dpi=96)
              plt.xlim(x_min,x_max)
              hist_data = plt.hist(df[target], bins=bins, color='tab:cyan', range=(x_min, x_max), rwidth=0.9)
              n   = len(hist_data[0])    #Specimen size
              plt.title("hand = "+hand+" , totalhands = "+str(tothands))

              # (2)Histogram drawing
              plt.gca().set_xticks(np.arange(x_min,x_max-k+d, k))

              #Test of normality (significance level 5)%)
              _, p = st.shapiro(hist_data[0])
              print(hist_data[0])
              print(st.shapiro(hist_data[0]))
              if p >= 0.05 :
                     print(f'  - p={p:.2f} ( p>=0.05 )And it can be said that the population has normality')
                     U2 = df[target].var(ddof=1)  #Population variance estimate (unbiased variance)
                     print(U2)
                     DF = n-1                     #Degree of freedom
                     SE = math.sqrt(U2/n)         #Standard error
                     print(SE)
                     ci1,ci2 = st.t.interval( alpha=0.95, loc=mu, scale=SE, df=DF )
              else:
                     print(f'  ※ p={p:.2f} ( p<0.05 )And the population cannot be said to be normal')


              # (3)Approximate curve assuming a normal distribution
              sig = df[target].std(ddof=1)  #Unbiased standard deviation: ddof(Degree of freedom)=1
              nx = np.linspace(x_min, x_max+d, 150) #150 divisions
              ny = st.norm.pdf(nx,mu,sig) * k * len(df[target])
              plt.plot( nx , ny, color='tab:blue', linewidth=1.5, linestyle='--')

              # (4)X-axis scale / label setting
              plt.xlabel('total"'+str(hand)+'"/'+str(count)+'hands',fontsize=12)
              plt.gca().set_xticks(np.arange(x_min,x_max+d, k))
              # (5)Y-axis scale / label setting
              y_max = max(hist_data[0].max(), st.norm.pdf(mu,mu,sig) * k * len(df[target]))
              y_max = int(((y_max//j)+1)*j) #The smallest multiple of j that is greater than the maximum frequency
              plt.ylim(0,y_max)
              plt.gca().set_yticks( range(0,y_max+1,j) )
              plt.ylabel('Accumulation',fontsize=12)

              # (6)Text output of mean and standard deviation
              tx = 0.03 #For character output position adjustment
              ty = 0.91 #For character output position adjustment
              tt = 0.08 #For character output position adjustment
              tp = dict( horizontalalignment='left',verticalalignment='bottom',
                     transform=plt.gca().transAxes, fontsize=11 )
              plt.text( tx, ty, f'average {mu:.2f}', **tp)
              plt.text( tx, ty-tt, f'deviation {sig:.2f}', **tp)
              plt.text( tx, ty-tt-tt, f'P-value {p:.2f}', **tp)
              plt.vlines( mu, 0, y_max, color='black', linewidth=1 )


              plt.show()

Recommended Posts

(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to verify and analyze the acceleration of Python by Cython
python beginners tried to predict the number of criminals
I tried to summarize the string operations of Python
I tried to estimate the interval.
I tried to find the entropy of the image with python
[Python] I tried to visualize the follow relationship of Twitter
I tried to estimate the pi stochastically
I tried to improve the efficiency of daily work with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried web scraping to analyze the lyrics.
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to find 100 million digits of pi
I tried to touch the API of ebay
[Python] I tried to analyze the pitcher who achieved no hit no run
I tried to correct the keystone of the image
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
Qiita Job I tried to analyze the job offer
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
I tried to analyze the New Year's card by myself using python
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
[Python] I tried to judge the member image of the idol group using Keras
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to refactor the code of Python beginner (junior high school student)
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to graph the packages installed in Python
How to get the number of digits in Python
I tried to summarize how to use matplotlib of python
I tried to summarize the basic form of GPLVM
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
Try to estimate the number of likes on Twitter
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
[Python] I tried to get Json of squid ring 2
I tried to classify the voices of voice actors
I tried to solve AOJ's number theory with Python
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
[Python] I tried to analyze the characteristics of thumbnails that are easy to play on YouTube by deep learning
I tried to put out the frequent word ranking of LINE talk with Python
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️
I tried to get the location information of Odakyu Bus
I tried the accuracy of three Stirling's approximations in python
I tried to implement the mail sending function in Python
[Machine learning] I tried to summarize the theory of Adaboost
I want to know the features of Python and pip