In O'Reilly's book "Data Visualization Beginning with Python and JavaScript" A nice input / output method for csv and json files in python was organized, so make a note. When I checked the operation with Jupyter Notebook, there were some places where an error occurred, so There are some differences from the code in the book.
Book official website: https://www.oreilly.co.jp/books/9784873118086/#toc
Windows 10 Pro 64bit Python: 3.6.1 Anaconda: 4.4.0 Jupyter Notebook: 1.0.0
Input / output data uses the following
nobel_winners = [
{ 'category': 'Physics',
'name': 'Albert Einstein',
'nationality': 'Swiss',
'sex': 'male',
'year': 1921},
{ 'category': 'Physics',
'name': 'Paul Dirac',
'nationality': 'British',
'sex': 'male',
'year': 1933},
{ 'category': 'Chemistry',
'name': 'Marie Curie',
'nationality': 'Polish',
'sex': 'female',
'year': 1911},
]
Writing the dictionary to the csv file with python is as follows. The first element of the array is fetched with nobel_winners [0] .keys (), and the keys of the dictionary are obtained. To sort the keys, use sorted (cols) and assign to cols. In the book, it was sorted by cols.sort (), but since an error occurred, it is written as follows.
cols = nobel_winners[0].keys()
cols = sorted(cols)
with open('data/nobel_winners.csv', 'w') as f:
f.write(','.join(cols) + '\n')
for o in nobel_winners:
row = [str(o[col]) for col in cols]
f.write(','.join(row) + '\n')
To read the exported nobel_winners.csv, execute the following.
with open('data/nobel_winners.csv', 'r') as f:
for line in f.readlines():
print(line, sep='')
Out
category,name,nationality,sex,year
Physics,Albert Enistein,Swiss,male,1921
Physics,Paul Dirac,British,male,1933
Chemistry,Marie Curie,Polish,female,1911
Next, read and write using the csv module of python. If you do not specify newline ='' for open, when you run it on Jupyter Notebook There is another line break under writer.writerow, so I put it in.
import csv
with open('data/nobel_winners.csv', 'w', newline='') as f:
fieldnames = nobel_winners[0].keys()
fieldnames = sorted(fieldnames)
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for w in nobel_winners:
writer.writerow(w)
When reading a file using the csv module, it will be as follows.
import csv
with open('data/nobel_winners.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row, sep='')
Out
['category', 'name', 'nationality', 'sex', 'year']
['Physics', 'Albert Enistein', 'Swiss', 'male', '1921']
['Physics', 'Paul Dirac', 'British', 'male', '1933']
['Chemistry', 'Marie Curie', 'Polish', 'female', '1911']
You can also get csv data by converting the lines to a Python dictionary. The following is what I got in that way.
import csv
with open('data/nobel_winners.csv') as f:
reader = csv.DictReader(f)
nobel_winners = list(reader)
for w in nobel_winners:
w['year'] = int(w['year'])
nobel_winners
Out
[OrderedDict([('category', 'Physics'),
('name', 'Albert Enistein'),
('nationality', 'Swiss'),
('sex', 'male'),
('year', 1921)]),
OrderedDict([('category', 'Physics'),
('name', 'Paul Dirac'),
('nationality', 'British'),
('sex', 'male'),
('year', 1933)]),
OrderedDict([('category', 'Chemistry'),
('name', 'Marie Curie'),
('nationality', 'Polish'),
('sex', 'female'),
('year', 1911)])]
The csv reader does not guess the data type when reading from a csv file, Since everything is treated as a string, year needs to be cast to an int.
The Python dictionary can be saved to a JSON file using the json module. When saving, use the dump method of the json module.
import json
with open('data/nobel_winners.json', 'w') as f:
json.dump(nobel_winners, f)
For reading JSON files, see the json module You can do this by using the load method.
import json
with open('data/nobel_winners.json') as f:
nobel_winners = json.load(f)
nobel_winners
Out
[{'category': 'Physics',
'name': 'Albert Enistein',
'nationality': 'Swiss',
'sex': 'male',
'year': 1921},
{'category': 'Physics',
'name': 'Paul Dirac',
'nationality': 'British',
'sex': 'male',
'year': 1933},
{'category': 'Chemistry',
'name': 'Marie Curie',
'nationality': 'Polish',
'sex': 'female',
'year': 1911}]
When loading the json module, unlike csv, the integer type of year is No casting is required as it will be loaded while being preserved.
To encode Python data containing datetime type Create a custom encoder like the one below.
import datetime
import json
class JSONDateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime.date, datetime.datetime)):
return obj.isoformat()
else:
return json.JSONEncoder.default(self, obj)
def dumps(obj):
return json.dumps(obj, cls=JSONDateTimeEncoder)
now_str = dumps({'time': datetime.datetime.now()})
now_str
Out
'{"time": "2017-09-03T01:03:32.634095"}'
First, a customized date processing encoder Subclass the JSONEncoder to create it. In this process, if the passed argument obj is a datetime object It is executing to return the date and time isoformat. Set a custom date encoder in the cls argument in the json.dumps method.
--Data visualization started with Python and JavaScript https://www.oreilly.co.jp/books/9784873118086/#toc --Reading and writing CSV files https://docs.python.jp/3/library/csv.html
Recommended Posts