Excel with Python

When doing something on a PC, using Excel is inevitable. Imadoki, I'm sure there are people who use Lotus 1-2-3 saying "I hate Microsoft". Some measurement data etc. are saved on the Excel sheet. When it becomes, it is better to be able to read the data from the Excel file.

Is it Ikan in CSV?

If you can afford to open the file in Excel and save it in CSV or tab-delimited text format, of course, that's fine. But sometimes it happens. The results of the 30-day experiment are 30 files. In that case, repeat the operation of opening each file one by one and saving it in CSV again, or write Excel VBA that will do the operation automatically, hmm, it's good, Excel is Excel ..

I can't say that, so look for a package that can read and write Excel files provided by conda. At times like this, there is no choice but to google. There seems to be a package called xlrd.

% anaconda search -t conda xlrd
Using Anaconda API: https://api.anaconda.org/
Run 'anaconda show <USER/PACKAGE>' to get more details:
Packages:
     Name                      |  Version | Package Types   | Platforms
     ------------------------- |   ------ | --------------- | ---------------
     RMG/xlrd                  |    0.9.4 | conda           | linux-64, win-32, win-64, linux-32, osx-64
     RahulJain/xlrd            |    1.0.0 | conda           | win-64
     aetrial/xlrd              |          | conda           | linux-64, osx-64
     anaconda/xlrd             |    1.0.0 | conda           | linux-64, win-32, win-64, linux-32, osx-64
     auto/xlrd                 |    0.8.0 | conda           | linux-64
                                          : http://www.python-excel.org/
     jetztcast/xlrd            |    0.9.3 | conda           | linux-64, osx-64
                                          : Library for developers to extract data from Microsoft Excel (tm) spreadsheet files
Found 6 packages
% anaconda show anaconda/xlrd
Using Anaconda API: https://api.anaconda.org/
Name:    xlrd
Summary:
Access:  public
Package Types:  conda
Versions:
   + 0.9.2
   + 0.9.4
   + 0.9.3
   + 0.9.0
   + 0.9.1
   + 1.0.0

To install this package with conda run:
     conda install --channel https://conda.anaconda.org/anaconda xlrd

Read data from Excel sheet

When I was assigned to work alone and lived in Kyoto, it was often said that "Kyoto would be hot", so I downloaded the daily maximum temperature data for Tokyo and Kyoto in July 2016 from the Meteorological Agency for verification. I tried to summarize it in Excel.

20160831001.jpg

It's hotter in Kyoto just by looking at it. You don't even have to parse it in Python. But let's take a second look and calculate the average temperature in both cities.

import xlrd
import os.path
import numpy as np
xlfile = "test.xlsx"
if os.path.exists(xlfile):
    xls = xlrd.open_workbook(xlfile)
    sheet1 = xls.sheet_by_index(0)
    nrows = sheet1.nrows-1
    ncols = sheet1.ncols 
    data = np.zeros(ncols*nrows).reshape((nrows, ncols))
    for r in range(1, nrows+1):
        for c in range(0, ncols):
            data[r-1,c] = sheet1.cell(r,c).value
    tokyo = data[:,1].mean()
    kyoto = data[:,2].mean()
    msg = "Tokyo(mean): %.2f\nKyoto(mean): %.2f" % (tokyo, kyoto)
    print(msg)

Create an array with 0 contents with np.zeros () and transform it into a two-dimensional array with reshape (). Since the first row of Excel is the header, subtract 1 from the value of sheet1.nrows. It's a numpy array, so if you want to find the mean, just do mean (). Looking at the results, it's still hotter in Kyoto.

Test

By the way, is it okay to decide that Kyoto is really hot just by comparing the average values? After all, it is better to test whether there is a statistically significant difference here.

You can easily test the difference between the mean values using scipy's stats.

import xlrd
import os.path
import numpy as np
from scipy import stats
xlfile = "test.xlsx"
if os.path.exists(xlfile):
    xls = xlrd.open_workbook(xlfile)
    sheet1 = xls.sheet_by_index(0)
    nrows = sheet1.nrows-1
    ncols = sheet1.ncols
    data = np.zeros(ncols*nrows).reshape((nrows, ncols))
    for r in range(1, nrows):
        for c in range(0, ncols):
            data[r-1,c] = sheet1.cell(r,c).value
    tokyo = data[:,1]
    kyoto = data[:,2]
    t,p = stats.ttest_ind(tokyo, kyoto, equal_var=False))
    msg = "p-value: %.5f" % p
    print(msg)

In addition, in the temperature of Tokyo and Kyoto, when testing the hypothesis that "Tokyo is lower", it is a one-sided test, but the option to specify that it is a one-sided test is in the stats.ttest_ind () method. It doesn't seem to exist. So, in this case, we will substitute it by showing that the t-value is negative and the p-value is 0.025 or less.

So, when I actually tried it, it became as follows.

t: -4.09874
p-value: 0.00014

Graph drawing

I read the data from the Excel sheet, so let's make a graph. In R, ggplot2 is used, but in Python in Anaconda environment, matplotlib can be used.

import xlrd
import os.path
import pandas as pd
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
%matplotlib inline 

xlfile = "test.xlsx"
if os.path.exists(xlfile):
    xls = xlrd.open_workbook(xlfile)
    sheet1 = xls.sheet_by_index(0)
    nrows = sheet1.nrows - 1
    ncols = sheet1.ncols
    data = np.zeros(ncols*nrows).reshape((nrows, ncols))
    date = []
    for r in range(1, nrows+1):
        for c in range(0, ncols):
            if c==0:
                d =  xlrd.xldate.xldate_as_datetime(sheet1.cell(r,c).value, xls.datemode)
                date.append(d)
            else:
                data[r-1,c] = sheet1.cell(r,c).value

    tokyo = data[:,1]
    kyoto = data[:,2]
    
    plt.plot(date, tokyo, label="Tokyo")
    plt.plot(date, kyoto, label="Kyoto")
    plt.legend()
    plt.show()

When drawing a line graph, I want to use the date read from Excel for the X axis. When the date 2016/07/01 is displayed on the Excel file, the data is "41090". Looking at this, I don't think it's 7/1, so I'll convert it. That is the following line.

d =  xlrd.xldate.xldate_as_datetime(sheet1.cell(r,c).value, xls.datemode)

With this, it is converted to datetime type, and "41090" becomes "2016-07-01 00:00:00". So, date.append (d) one by one in the list initialized by date = [].

By the way, when printing () the variable d, if you use d.strftime (ā€œ% m /% dā€), you can print 7/1 only for the month / day. I can do it, but when I append the converted string to the list date, an error occurs at plt.plot (), so what should I do?

That's why the graph I could draw looks like this. After all the X axis is hard to read. I want to do something about it. Looking at this graph, we can see that Kyoto is hotter without testing (that is, data visualization is important).

20160831003.png

This time code and data

Recommended Posts

Excel with Python
Handle Excel with python
Operate Excel with Python (1)
Operate Excel with Python (2)
Operate Excel with Python openpyxl
Let's run Excel with Python
solver> Link> Solve Excel Solver with python
Statistics with python
Python with Go
Twilio with Python
Integrate with Python
Create an Excel file with Python3
Play with 2016-Python
AES256 with python
Tested with Python
Let's play with Excel with Python [Beginner]
python starts with ()
with syntax (Python)
Handle Excel CSV files with Python
Bingo with python
Zundokokiyoshi with python
Microcomputer with Python
Excel aggregation with Python pandas Part 1
Cast with python
Draw Nozomi Sasaki in Excel with python
Convert Excel data to JSON with python
Excel aggregation with Python pandas Part 2 Variadic
[Easy Python] Reading Excel files with openpyxl
Create Excel file with Python + similarity matrix
Excel table creation with Python [Progress management table]
Zip, unzip with python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Scraping with Python (preparation)
Try scraping with Python.
Learning Python with ChemTHEATER 03
Sequential search with Python
"Object-oriented" learning with python
Run Python with VBA
Handling yaml with python
Serial communication with python
Learning Python with ChemTHEATER 05-1
Learn Python with ChemTHEATER
Run prepDE.py with python3
1.1 Getting Started with Python
Collecting tweets with Python
Binarization with OpenCV / Python
3. 3. AI programming with Python
Kernel Method with Python
Non-blocking with Python + uWSGI
Scraping with Python + PhantomJS
Posting tweets with python
Drive WebDriver with python
Use mecab with Python3
[Python] Redirect with CGIHTTPServer
Voice analysis with python
Think yaml with python
Operate Kinesis with Python
Getting Started with Python
Use DynamoDB with Python