I want to be able to analyze data with Python (Part 1)

I want to be able to analyze data with Python

By the way, I'm new to Python. I heard that Python has a clean grammar, abundant libraries, and is powerful, so I decided to take a little bit of it during the year-end holidays.

Subject

If you want to study anything, you should have specific subjects and problems. So I decided to base it on the content of a book about statistics that I started reading recently.

Statistics is the strongest study [Practice] Thoughts and methods for data analysis-by Hiromu Nishiuchi

** Chapter 1 Statistics Practice Begins with a Basic Review ** 05 Why can the average value capture the truth?

In this chapter, there are some examples of using "coins with a probability of 2/3 of the back and 1/3 of the probability of the front". So I decided to use Python to actually throw "coins" many times (although it's a simulation, of course) and see if the result is the same as the example. (For details, please buy a book or borrow it from the library. It is the part from P59 to P64.)

Python version

The version of Python used is 2.7. I put it in as I was told when I received something like a Python seminar in the summer [Spyder](https://ja.wikipedia.org/wiki/Spyder_(%E3%82%BD%E3%83%95%) E3% 83% 88% E3% 82% A6% E3% 82% A7% E3% 82% A2)) was installed, so I used it.

Probability of throwing a coin twice

First, I experimented with Python with an example of the probabilities of all combinations when throwing a coin twice.

Possible combinations are: Back / back (front 0) Back / front (1 front) Front / back (1 front) Table / table (2 tables)

Python code

from random import randint
from decimal import Decimal
from prettytable import PrettyTable
import numpy as np

def tossBiasedCoin():
    """ Returns 0 or 1 with 0 having 2/3 chance """
    return randint(0,2) % 2

# Make a 2x2 array
counts = [[0 for j in range(2)] for i in range(2)]

# Toss a coin many times to get counts
sampleCount = 500000
for num in range(sampleCount):
    first = tossBiasedCoin()
    second = tossBiasedCoin()
    counts[first][second] += 1

# Conert all counts to perentage
TWOPLACES = Decimal(10) ** -2 
for i in range(2):
    for j in range(2):
        value = counts[i][j]        
        counts[i][j] = (100 * Decimal(counts[i][j])/Decimal(sampleCount)).quantize(TWOPLACES)
        print("Converted the value {} to percentage {}".format(value, counts[i][j]))

# Make summaries of number of heads.
keys = np.arange(3)
values = [counts[0][0], 
          counts[0][1]+counts[1][0],
          counts[1][1]]

# Add row descriptions
counts[0].insert(0, '1st tail')
counts[1].insert(0, '1st head')

# Create table with column descriptions, add rows, then show it.
table = PrettyTable(["", "2nd tail", "2nd head"])
table.padding_width = 1
table.add_row(counts[0])
table.add_row(counts[1])
print table

# Draw a bar chart
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
rects = plt.bar(keys,
                 values, 
                 0.5,
                 alpha=0.4,
                 align="center", 
                 color='b')

plt.xlabel('Number of heads')
plt.ylabel('Probability (%)')
plt.title('Probabilities heads with a biased coin')
plt.xticks(keys, np.arange(3))

plt.tight_layout()
plt.show()

Execution result

First, almost the same table as Chart 1-17 on page 59 is displayed. 4 combinations and their probabilities. (This table uses PrettyTable. See below) image

The reason why it is difficult to combine "table / table" is that you are using coins that have a 1/3 chance of appearing in the first place.

Next, a bar graph that is almost the same as Chart 1-18 on p60 is displayed.

image

This is a collection of 4 patterns into 3 by "how many tables appear". There are two patterns of "one front", (front / back) and (back / front), so they are combined into one.

Code content

Library import

Import to the execution environment such as the function to be used. A useful library may be in the Python Standard Library, but there are a huge number of third-party The library is in the repository called Python Package Index.

from random import randint
from decimal import Decimal
from prettytable import PrettyTable
import numpy as np

--Use the randint of the random module to generate random numbers. --Use Decimal from the decimal module to trim the float type to the last two digits. --Use PrettyTable from the prettytable module to create a table. --I import the numpy module because the function called arange is convenient, but I haven't used it in a particularly heavy way.

Function definition in Python

def tossBiasedCoin():
    """ Returns 0 or 1 with 0 having 2/3 chance """
    return randint(0,2) % 2

It's not enough to make it a function, but as a practice of Function definition, the front and back of the coin (1 or 0) ) Was created. Generates one of 0, 1, and 2 as a random number, returns 0 if the value is even, and returns 1 otherwise. Since two of the three values are even, the probability is 2/3.

Prepare 2x2 variables

Use 2x2 sequence to record the frequency of occurrence. In this case, it will be list type.

# Make a 2x2 array
counts = [[0 for j in range(2)] for i in range(2)]

Initialize each variable to 0. for statement is one of the built-in functions range function The loop is executed on the instant list created by (.jp/2/library/functions.html#range). To make it 2x2, create a list with list as an element.

Throw a coin and record the result

I'll throw it 500,000 times here, but I wonder if I don't have to throw that much (laughs)

# Toss a coin many times to get counts
sampleCount = 500000
for num in range(sampleCount):
    first = tossBiasedCoin()
    second = tossBiasedCoin()
    counts[first][second] += 1

The result is 0 or 1, so you can just use it as an index for a 2x2 structure. Increase the number of indexed cells by one.

By the way, in Python, there seems to be no familiar ++ operator in C language. Click here for a list of Python operators.

Convert to percentage

Divide the frequency by the total number of throws to get the percentage.

# Convert all counts to perentage
TWOPLACES = Decimal(10) ** -2
for i in range(2):
    for j in range(2):
        value = counts[i][j]
        counts[i][j] = (100 * Decimal(counts[i][j])/Decimal(sampleCount)).quantize(TWOPLACES)
        print("Converted the value {} to percentage {}".format(value, counts[i][j]))

Since we will visit each cell of the 2x2 structure, we will loop using the two indexes i and j and access the value of any cell in the form of [i] [j]. The value is replaced by in-place, but the value before and after conversion is displayed for debugging.

Decimal.quantize Round the value to the last two digits by passing 0.01 to the function.

Prepare data for bar chart

There are three bars in the bar graph. 0 tables, 1 table, and 2 tables.

# Make summaries of number of heads.
keys = np.arange(3)
values = [counts[0][0],
          counts[0][1]+counts[1][0],
          counts[1][1]]

Only the frequency of one table is the same because it doesn't matter whether the table is the first coin or the second coin.

Prepare data for table

Use list.insert on the left side of list, which is the front row, and use "1st throw is back" and "1st throw is front". "Is added.

# Add row descriptions
counts[0].insert(0, '1st tail')
counts[1].insert(0, '1st head')

Make a table

Use a third party library function called PrettyTable.

# Create table with column descriptions, add rows, then show it.
table = PrettyTable(["", "2nd tail", "2nd head"])
table.padding_width = 1
table.add_row(counts[0])
table.add_row(counts[1])
print table

Make a bar graph

Use a third-party library, matplotlib. This matplotlib seems to be rich enough to write a book by itself (see Gallery). Write a bar graph in pyplot in matplotlib.

# Draw a bar chart
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
rects = plt.bar(keys,
                 values,
                 0.5,
                 alpha=0.4,
                 align="center",
                 color='b')

plt.xlabel('Number of heads')
plt.ylabel('Probability (%)')
plt.title('Probabilities heads with a biased coin')
plt.xticks(keys, np.arange(3))

plt.tight_layout()
plt.show()

(Part 2)

Recommended Posts

I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
I want to analyze logs with Python
I tried to analyze J League data with Python
I want to debug with Python
I want to be able to run Python in VS Code
I want to play with aws with python
☆ Professor Anzai… !! I want to analyze the data …… Part 1 Data preparation ☆ Let's analyze the NBA player stats (results) with Python. basketball
[Pandas] I tried to analyze sales data with Python [For beginners]
I want to use MATLAB feval with python
I want to analyze songs with Spotify API 2
I want to knock 100 data sciences with Colaboratory
I want to make a game with Python
I want to be an OREMO with setParam!
I tried to get CloudWatch data with Python
I want to analyze songs with Spotify API 1
I want to use Temporary Directory with Python2
#Unresolved I want to compile gobject-introspection with Python3
I want to solve APG4b with Python (Chapter 2)
I want to write to a file with Python
I want to handle optimization with python and cplex
I want to inherit to the back with python dataclass
I want to work with a robot in python.
I want to AWS Lambda with Python on Mac!
[ML Ops] I want to do multi-project with Python
I want to run a quantum computer with Python
I want to specify another version of Python with pyvenv
I tried to make various "dummy data" with Python faker
I want to automatically attend online classes with Python + Selenium!
[Python] I want to use the -h option with argparse
I want to do ○○ with Pandas
I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with corona sickness-Part 2)
I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with corona sickness-Part 1)
I know? Data analysis using Python or things you want to use when you want with numpy
I want to use a wildcard that I want to shell with Python remove
I want to know the weather with LINE bot feat.Heroku + Python
I want to monitor UNIQLO + J page updates [Scraping with python]
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
I want to output the beginning of the next month with Python
Try to analyze online family mahjong using Python (PART 1: Take DATA)
I want to do a full text search with elasticsearch + python
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I want to detect objects with OpenCV
Process Pubmed .xml data with python [Part 2]
I want to handle the rhyme part1
I want to blog with Jupyter Notebook
I want to handle the rhyme part3
Convert Excel data to JSON with python
I want to use jar from python
I wanted to solve ABC160 with Python
I want to build a Python environment
I want to pip install with PythonAnywhere
Convert FX 1-minute data to 5-minute data with Python
[Part1] Scraping with Python → Organize to csv!
I wanted to solve ABC172 with Python
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
I want to do it with Python lambda Django, but I will stop