[PYTHON] Save TOPIX time series in pickle, csv, Excel format

What to do

Since I did the previous post to display the TOPIX time series, Save it in various formats for easy processing later.

Code to parse Yahoo Finance html

Same as before. However, I put in a function call called savedataAs somehow (filename, data).

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import pylab
import urllib2
import lxml
import lxml.html
import re

dateFr = {"year": 2000, "month":1, "day":1}
dateTo = {"year": 2013, "month":11, "day": 1}

data = []
for page in range(1, 30):
    print page
    url = "http://info.finance.yahoo.co.jp/history/?code=998405.T&sy=%d&sm=%d&sd=%d&ey=%d&em=%d&ed=%d&tm=d&p=%d"
    url = url % (dateFr["year"], dateFr["month"], dateFr["day"], dateTo["year"], dateFr["month"], dateFr["day"], page)

    html = urllib2.urlopen(url).read()
    root = lxml.html.fromstring(html)
    table = root.xpath(' //*[contains(concat(" ",normalize-space(@class)," "), " boardFin ")]')[0]

    for tr in table.xpath("descendant::tr"):
        tmp = [td.text for td in tr.xpath("descendant::td")]
        if len(tmp) != 5:
            continue
        begin = float(tmp[1].replace(",", ""))
        high  = float(tmp[2].replace(",", ""))
        low   = float(tmp[3].replace(",", ""))
        end   = float(tmp[4].replace(",", ""))
        data.append([low, high, low, high])
savedataAsPickle('topix.pickle', data)
savedataAsCSV('topix.csv', data)
savedataAsExcel('topix.xlsx', data)

Code to save

pickle Format in the standard python library

def savedata(filename, data):
    pickle.dump(data, open(filename, "w"))

csv Abbreviation for Comma Separated Values. Text format.

def savedata(filename, data):
    import csv
    csvobj = csv.writer(open(filename, "w"))
    csvobj.writerows(data)

Excel format

Save in a format that can be read by Excel

def savedata(filename, data):
    import openpyxl.workbook
    import openpyxl.cell
 
    wb = openpyxl.workbook.Workbook()
    ws = wb.worksheets[0]
    ws.title = "TOPIX"
    for idat, dat in enumerate(data):
        col = openpyxl.cell.get_column_letter(idat + 1) # from 1, 2, ... to A, B, ...
        for irow, d in enumerate(dat):
            ws.cell('%s%s'%(col, irow + 1)).value = d
    wb.save(filename)

Recommended Posts

Save TOPIX time series in pickle, csv, Excel format
Display TOPIX time series
Save Time type in SQLAlchemy
[Django] Import and export DB tables in Excel or CSV format
Plot CSV of time series data with unixtime value in Python (matplotlib)
Get date and time in specified format
Export Amazon RDS tables in CSV format
Convenient time series aggregation with TimeGrouper in pandas
Get time series data from k-db.com in Python
Convert UTF-8 CSV files to read in Excel
How to read time series data in PyTorch
Save pandas data in Excel format to data assets with Cloud Pak for Data (Watson Studio)