Try to calculate a statistical problem in Python

An article on statistics was posted on President Online.

Is there a "correlation" between breakfast and work hours and business performance? http://president.jp/articles/-/12416

In the above article, mathematical formulas are certainly not mentioned, so it is easy to understand and the explanation is detailed, so it is perfect for getting started with statistics. However, it is assumed that it will be calculated manually in Excel, which is a bit of a hassle.

So I would like to calculate these problems with the Python I have been using so far.

Problem and its solution

The content of the problem is to investigate whether each employee has a correlation with the probability of having eaten breakfast (= breakfast rate), the time of arrival at work, and the business performance as three variables. Examining the correlation between variables in this way can be said to be the basis of various statistics.

Let's call each variable X Y Z so that it can be handled by a calculator. First, I prepared this as CSV file data.

Calculate basic statistics

First, find the statistics that appear on Page 2. Read the above data to find basic statistics such as mean and standard deviation. This is easy with pandas and can be found in a matter of seconds.

data = pd.read_csv("data.csv", names=['X', 'Y', 'Z'])
data.describe()
# =>
#                 X          Y           Z
# count    7.000000   7.000000    7.000000
# mean    42.571429  -8.571429   98.714286
# std     42.968427  14.920424    8.440266
# min      0.000000 -40.000000   88.000000
# 25%      5.000000 -10.000000   92.000000
# 50%     33.000000  -5.000000  100.000000
# 75%     77.500000   0.000000  104.500000
# max    100.000000   5.000000  110.000000

Draw a scatterplot matrix

In the original article, I drew a scatter plot to examine the correlation. Let's do this in Python as well. To find out the correlation of each variable collectively, it is quick to draw a scatter plot matrix.

from pandas.tools.plotting import scatter_matrix
plt.figure()
scatter_matrix(data)
plt.savefig("image.png ")

1.png

Find the correlation coefficient

The correlation coefficient can be obtained by dividing the covariance by the standard deviation of two variables, but using pandas, it can be easily obtained with a single function.

data.corr()
#=>
#           X         Y         Z
# X  1.000000  0.300076  0.550160
# Y  0.300076  1.000000 -0.545455
# Z  0.550160 -0.545455  1.000000

I was able to find the correlation matrix in page 5 in one shot. As a general guideline, it is said that there is a strong correlation when it is 0.7 or more, so it can be said that it is a delicate correlation as described in the original article.

Do regression analysis

Finally, find the regression equation that appears at the end of 4th page. This is one of the statistical functions of SciPy [scipy.stats.linregress](http://docs.scipy.org/doc/scipy-0.14.0/reference/ It can be obtained by simple regression analysis using generated / scipy.stats.linregress.html).

#Retrieve value
x = data.ix[:,0].values
y = data.ix[:,1].values
z = data.ix[:,2].values

#Regression equation for X and Z
slope, intercept, r_value, p_value, std_err = sp.stats.linregress(x, z)
print(slope, intercept, r_value)
#=> 0.108067677706 94.113690292 0.550160142939

#Regression equation for Y and Z
slope, intercept, r_value, p_value, std_err = sp.stats.linregress(y, z)
print(slope, intercept, r_value)
#=> -0.308556149733 96.0695187166 -0.545455364632

Note that slope is the slope, intercept is the intercept, and r_value is the correlation coefficient. With the slope as a and the intercept as b, the linear equation y = ax + b is obtained.

For example, a linear regression equation for X and Z regresses to the equation y = 0.11x + 94.11 (up to two decimal places).

Summary

Using Python made statistical analysis even easier than in Excel. Examining the correlation between two variables is one of the basics of statistics, so it is often applied to real problems, and once you get used to it, you will be able to perform these analyzes in a very short time.

Recommended Posts

Try to calculate a statistical problem in Python
Try to calculate Trace in Python
Try to calculate RPN in Python (for beginners)
Try to make a Python module in C language
ABC166 in Python A ~ C problem
Just try to receive a webhook in ngrok and python
Try logging in to qiita with Python
Try sending a SYN packet in Python
Try drawing a simple animation in Python
How to get a stacktrace in python
Try a functional programming pipe in Python
Try to get a list of breaking news threads in Python.
First steps to try Google CloudVision in Python
Try to implement Oni Maitsuji Miserable in python
3.14 π day, so try to output in Python
How to clear tuples in a list (Python)
Try auto to automatically price Enums in Python 3.6
To execute a Python enumerate function in JavaScript
How to embed a variable in a python string
Try to solve the Python class inheritance problem
I want to create a window in Python
How to create a JSON file in Python
Try to draw a life curve with python
Try to make a "cryptanalysis" cipher with Python
A clever way to time processing in Python
Steps to develop a web application in Python
Try gRPC in Python
To add a module to python put in Julialang
How to notify a Discord channel in Python
Try to make a dihedral group with Python
[Python] How to draw a histogram in Matplotlib
Try 9 slices in Python
Try to solve a set problem of high school math with Python
[Python] [Word] [python-docx] Try to create a template of a word sentence in Python using python-docx
A solution to the problem that the Python version in Conda cannot be changed
[Cloudian # 10] Try to generate a signed URL for object publishing in Python (boto3)
Parse a JSON string written to a file in Python
How to convert / restore a string with [] in python
I want to embed a variable in a Python string
[Python] Try to read the cool answer to the FizzBuzz problem
Try to make a command standby tool with python
Try to improve your own intro quiz in Python
(Python) Try to develop a web application using Django
Try to solve the internship assignment problem with Python
I want to write in Python! (2) Let's write a test
[Python] How to expand variables in a character string
Create a plugin to run Python Doctest in Vim (2)
Try searching for a million character profile in Python
Try embedding Python in a C ++ program with pybind11
I tried to implement a pseudo pachislot in Python
Create a plugin to run Python Doctest in Vim (1)
A memorandum to run a python script in a bat file
I want to randomly sample a file in Python
I want to work with a robot in python.
Things to note when initializing a list in Python
Introduction to Linear Algebra in Python: A = LU Decomposition
[Python] Created a method to convert radix in 1 second
How to execute a command using subprocess in Python
Publish / upload a library created in Python to PyPI
Take a screenshot in Python
Calculate mW <-> dBm in Python