[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)

8.1 File input / output

--Open the file with fileobj = open (filename, mode). --fileobj is the object of the file returned by open () --filename is the file name --Select what you want to do with the file. --r is read, w is write and can be overwritten and newly created, x is write but only when the file does not exist. The second character of --mode indicates the file type. t means text and b means binary.

8.1.1 Writing to a text file with write ()


>>> poem = """There was a young lady named Bright,
... Whose speed was far faster than light,
... She started one day,
... In a relative way,
... And returned on the previous night."""
>>> len(poem)
151

#write()The function returns the number of bytes written.
>>> f=open("relatibity","wt")
>>> f.write(poem)
151
>>> f.close()

#print()But you can write to a text file.
#print()Adds a space after each argument and a newline at the end of the whole.
>>> f=open("relatibity","wt")
>>> print(poem,file=f)
>>> f.close()

#print()Write()To work the same as sep,Use end.
#sep:Separator. Space by default(" ")become.
#end:The last string. Line breaks by default("\n")become.
>>> f=open("relatibity","wt")
>>> print(poem,file=f,sep="",end="")
>>> f.close()

#If the source string is very large, you can split it into chunks and write them to a file.
>>> f=open("relatibity","wt")
>>> size=len(poem)
>>> offset=0
>>> chunk=100
>>> while True:
...     if offset>size:
...         break
...     f.write(poem[offset:offset+chunk])
...     offset+=chunk
... 
100
51
>>> f.close()

#File corruption can be prevented by preventing overwriting with x mode.
>>> f=open("relatibity","xt")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'relatibity'

#It can also be used as an exception handler.
>>> try:
...     f=open("relatibity","xt")
...     f.write("stomp stomp stomp")
... except FileExistsError:
...     print("relativity already exists!. That was a close one.")
... 
relativity already exists!. That was a close one.


8.1.2 Reading text files with read (), readline (), headlines ()


#The entire file can be read at once.
>>> fin=open("relatibity","rt")
>>> poem=fin.read()
>>> fin.close()
>>> len(poem)
151

#read()You can limit the amount of data returned at one time by putting the number of characters in the argument of.
#Read more after reading all the files()When you call, the empty string("")Is returned. → if not f:Is evaluated as False.
>>> poem=""
>>> fin=open("relatibity","rt")
>>> chunk=100
>>> while True:
...     f=fin.read(chunk)
...     if not f:
...         break
...     poem+=f
... 
>>> fin.close()
>>> len(poem)
151

#readline()You can use to read the file line by line.
#Read all files()Similarly, it returns an empty string and is evaluated as False.
>>> poem=""
>>> fin=open("relatibity","rt")
>>> while True:
...     line=fin.readline()
...     if not line:
...         break
...     poem+=line
... 
>>> fin.close()
>>> len(poem)
151

#It can be easily read using an iterator.
>>> poem=""
>>> fin=open("relatibity","rt")
>>> for line in fin:
...     poem+=line
... 
>>> fin.close()
>>> len(poem)
151

#readlines()Reads one line at a time and returns a list of one-line strings.
>>> fin=open("relativity","rt")
>>> lines=fin.readlines()
>>> fin.close()
>>> for line in lines:
...      print(line,end="")
... 
There was a young lady named Bright,
Whose speed was far faster than light,
She started one day,
In a relative way,
And returned on the previous night.>>> 

8.1.3 Writing a binary file with write ()

--Adding "b" to the mode string will open the file in binary mode. In this case, you will read and write bytes instead of strings.


#Let's generate 256 bytes from 0 to 255.
>>> bdata=bytes(range(0,256))
>>> len(bdata)
256
>>> f=open("bfile","wb")
>>> f.write(bdata)
256
>>> f.close()

#It is also possible to write in chunk units as in the case of text.
>>> f=open("bfile","wb")
>>> size=len(bdata)
>>> offset=0
>>> chunk=100
>>> while True:
...     if offset>size:
...         break
...     f.write(bdata[offset:offset+chunk])
...     offset+=chunk
... 
100
100
56
>>> f.close

8.1.4 Read binary file with read ()


#"rb"Just open it as.
>>> f=open("bfile","rb")
>>> bdata=f.read()
>>> len(bdata)
256
>>> f.close()

8.1.5 with automatic closing of files

--with When the following context block ends, the file will be closed automatically. --Use in the form of expression as variable.


>>> with open("relatibity","wt") as f:
...     f.write(poem)
... 
151

8.1.6 Repositioning with seek ()

--The tell () function returns the offset from the beginning of the file to your current location in bytes.

--The seek () function can change the position of a file object. --Use f.seek (offset, whence) to change the position of the file object. It is also defined in the standard os module. --The file position is calculated by adding the offset value to the reference point. --Select the reference point with the whence argument. If it is 0, it moves from the beginning to the offset position, if it is 1, it moves from the current position to the offset byte position, and if it is 2, it moves from the end to the offset byte position. --whence is optional and the default value is 0, which means that the beginning of the file is used as the reference point.


>>> f=open("bfile","rb")
>>> f.tell()
0

#seek()Use to move to the last byte of the file.
#place(255)I moved to and read the back from there. → The last 1 byte has been read.
#An image that is located between 254 and 255 and has 255 loaded on the last load.
>>> f.seek(255)
255
>>> b=f.read()
>>> len(b)
1
#seek()Also returns the offset after the move.
>>> b[0]
255


>>> import os
>>> os.SEEK_SET
0
>>> os.SEEK_CUR
1
>>> os.SEEK_END
2


>>> f=open("bfile","rb")
#From the end-Moved to a position offset by 1.
>>> f.seek(-1,2)
255
#Returns the offset in bytes from the beginning of the file.
>>> f.tell()
255
>>> b=f.read()
>>> len(b)
1
>>> b[0]
255

#Move from the beginning of the file to the last 2 bytes
#Image between 253 and 254
>>> f=open("bfile","rb")
>>> f.seek(254,0)
254
>>> f.tell()
254
#Move from 2 bytes before the end of the file to 1 byte before
#Image between 254 and 255
>>> f.seek(1,1)
255
>>> f.tell()
255
>>> b=f.read()
>>> len(b)
1
>>> b[0]
255

8.2 Structured text file

--There are the following methods to create structural data. --Separator, separator. Separate with tabs ("\ t"), commas (","), vertical bars ("|"), etc. CSV format. --Separate tags with "<" and ">". XML and HTML correspond to this. --Things that make full use of symbols. JSON is so. --Indent. For example, YAML.

CSV

--Files separated into fields by delimiters are used as a data exchange format with spreadsheets and databases. --Some files use escape sequences. If the delimiter character is likely to be used within a field, enclose the entire field in quotes or precede the delimiter with an escape character.


>>> import csv
>>> villains=[
...     ["Doctor","No"],
...     ["R","K"],
...     ["Mister","Big"],
...     ["Auric","Goldfinger"],
...     ["E","B"],
...     ]

>>> with open ("villains","wt") as fout:
#writer()Write with
...     csvout=csv.writer(fout)
#A csv file called villains was created.
...     csvout.writerows(villains)

Execution result



Docter,No
R,K
Miser,Big
Auric,Goldfinger
E,B


>> import csv
>>> with open("villains","rt") as fin:
#reader()Read with
...     cin=csv.reader(fin)
...     villains=[row for row in cin]
... 
>>> print(villains)
[['Doctor', 'No'], ['R', 'K'], ['Mister', 'Big'], ['Auric', 'Goldfinger'], ['E', 'B']]



#DictReader()Specify the column name using.
>>> import csv
>>> with open("villains","rt") as fin:
...     cin=csv.DictReader(fin,fieldnames=["first","last"])
...     villains=[row for row in cin]
... 
>>> print(villains)
[{'first': 'Docter', 'last': 'No'}, {'first': 'R', 'last': 'K'}, {'first': 'Miser', 'last': 'Big'}, {'first': 'Auric', 'last': 'Goldfinger'}, {'first': 'E', 'last': 'B'}]


>>> import csv
>>> villains= [
...     {"first":"Docter","last":"No"},
...     {"first":"R","last":"K"},
...     {"first":"Miser","last":"Big"},
...     {"first":"Auric","last":"Goldfinger"},
...     {"first":"E","last":"B"},
...     ]
>>> with open("villains","wt") as fout:
...     cout=csv.DictWriter(fout,["first","last"])
#writeheader()You can also write the column name at the beginning of the CSV file using.
...     cout.writeheader()
...     cout.writerows(villains)
... 

Execution result



first,last
Docter,No
R,K
Miser,Big
Auric,Goldfinger
E,B


#Reread the data from the file.
#DictReader()If you omit the fieldnames argument in the call, the value on the first line of the file(first,last)Means that can be used as a dictionary key for column labels.
>>> import csv
>>> with open("villains","rt") as fin:
...     cin=csv.DictReader(fin)
...     villains=[row for row in cin]
... 
>>> print(villains)
[OrderedDict([('first', 'Docter'), ('last', 'No')]), OrderedDict([('first', 'R'), ('last', 'K')]), OrderedDict([('first', 'Miser'), ('last', 'Big')]), OrderedDict([('first', 'Auric'), ('last', 'Goldfinger')]), OrderedDict([('first', 'E'), ('last', 'B')])]


8.2.2 XML

--Use ElementTree to read XML easily.

menu.xml



<?xml version="1.0"?>
<menu>
#Optional attributes can be embedded in the start tag.
    <breakfast hours="7-11">
        <item price="$6.00">breakfast burritos</item>
        <item price="$4.00">pancakes</item>
    </breakfast>
    <lunch hours="11-3">
        <item price="$5.00">hamburger</item>
    </lunch>
    <dinner hours="3-10">
        <item price="$8.00">spaghetti</item>
    </dinner>
</menu>


>>> import xml.etree.ElementTree as et
>>> tree=et.ElementTree(file="menu.xml")
>>> root=tree.getroot()
>>> root.tag
'menu'
#tag is a string of tags and attrib is a dictionary of its attributes.
>>> for child in root:
...     print("tag:",child.tag,"attributes:",child.attrib)
...     for grandchild in child:
...         print("\ttag:",grandchild.tag,"attributes:",grandchild.attrib)
... 
tag: breakfast attributes: {'hours': '7-11'}
	tag: item attributes: {'price': '$6.00'}
	tag: item attributes: {'price': '$4.00'}
tag: lunch attributes: {'hours': '11-3'}
	tag: item attributes: {'price': '$5.00'}
tag: dinner attributes: {'hours': '3-10'}
	tag: item attributes: {'price': '$8.00'}
#number of menu sections
>>> len(root)
3
#Number of breakfast items
>>> len(root[0])
2

8.2.3 JSON

--It is a data exchange format that is very popular beyond the framework of JavaSciript. --The JSON format is a subset of JavaSciript and is often used in Python. --Json in the JSON module encodes (dumps) Python data into JSON strings and decodes (loads) JSON strings into Python data.


#Create data structure
>>> menu=\
... {
... "breakfast":{
...     "hours":"7-11",
...     "items":{
...             "breakfast burritos":"$6.00",
...             "pancakes":"$4.00"
...             }
...         },
... "lunch":{
...         "hours":"11-3",
...         "items":{
...             "hamburger":"$5.00"
...                 }
...         },
... "dinner":{
...     "hours":"3-10",
...     "items":{
...             "spaghetti":"$8.00"
...             }
...     }
... }

#dumps()This data structure using(menu)JSON string(menu_json)Encode to.
>>> import json
>>> menu_json=json.dumps(menu)
>>> menu_json
`{"breakfast": {"hours": "7-11", "items": {"breakfast burritos": "$6.00", "pancakes": "$4.00"}}, "lunch": {"hours": "11-3", "items": {"hamburger": "$5.00"}}, "dinner": {"hours": "3-10", "items": {"spaghetti": "$8.00"}}}`

#loads()Use the JSON string menu_Return json to Python data structure menu2.
>>> menu2=json.loads(menu_json)
>>> menu2
{'breakfast': {'hours': '7-11', 'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}}, 'lunch': {'hours': '11-3', 'items': {'hamburger': '$5.00'}}, 'dinner': {'hours': '3-10', 'items': {'spaghetti': '$8.00'}}}

#When trying to encode or decode some objects such as datetime, the following exception occurs.
#This is because the JSON standard does not define date and time types.
>>> import datetime
>>> now=datetime.datetime.utcnow()
>>> now
datetime.datetime(2020, 1, 23, 1, 59, 51, 106364)
>>> json.dumps(now)
#...abridgement
TypeError: Object of type datetime is not JSON serializable

#You can convert datetime to a string or Unix time.
>>> now_str=str(now)
>>> json.dumps(now_str)
'"2020-01-23 01:59:51.106364"'
>>> from time import mktime
>>> now_epoch=int(mktime(now.timetuple()))
>>> json.dumps(now_epoch)
'1579712391'

#If the data type that is normally converted includes a datetime type value, it is troublesome to convert each time.
#So json.Create a class that inherits JSONEncoder.
#Override the default method.
#isinstance()The function is obj is datetime.Check if it is an object of datetime class.
>>> class DTEncoder(json.JSONEncoder):
...     def default(self,obj):
...     #isinstance()Checks the type of obj.
...         if isinstance(obj,datetime.datetime):
...             return int(mktime(obj.timetuple()))
...         return json.JSONEncoder.default(self,obj)
... 
#now=datetime.datetime.utcnow()Since it is defined as, True is returned.
>>> json.dumps(now,cls=DTEncoder)
`1579712391`

>>> type(now)
<class `datetime.datetime`>
>>> isinstance(now,datetime.datetime)
True
>>> type(234)
<class `int`>
>>> type("hey")
<class `str`>
>>> isinstance("hey",str)
True
>>> isinstance(234,int)
True

8.2.4 YAML

--Like JSON, YAML has keys and values, but it can handle more data types than JSON, including dates and times.

--In order to process YAML, you need to install a library called PyYAML.

mcintyre.yaml



name:
  first:James
  last:McIntyre
dates:
  birth:1828-05-25
  death:1906-03-31
details:
  bearded:true
  themes:[cheese,Canada]
books:
  url:http://www.gutenberg.org/files/36068/36068-h/36068-h.htm
poems:
  - title: "Motto" #An error occurred because there was no half-width space.
    text: |
        Politeness,perseverance and pluck,
        To their possessor will bring good luck.
  - title: "Canadian Charms" #An error occurred because there was no half-width space.
    text: |
        Here industry is not in vain,
        For we have bounteous crops of grain,
        And you behold on every field
        Of grass and roots abundant yield,
        But after all the greatest charm
        Is the snug home upon the farm,
        And stone walls now keep cattle warm.


>>> import yaml
>>> with open("mcintyre.yaml","rt") as fin:
...     text=fin.read()
... 
>>> data=yaml.load(text)
>>> data["details"]
'bearded:true themes:[cheese,Canada]'
>>> len(data["poems"])
2


8.2.5 Serialization with pickle

--Taking a Python data hierarchy and converting it to a string representation is called ** serialization **. Reconstructing data from a string representation is called ** deserialization **. --From serialization to deserialization, the string representation of an object can be saved in the form of a file or data, or sent to a remote machine over the network.

--Python is a special binary format that provides a pickle module that can store and restore any object.


>>> import pickle
>>> import datetime
>>> now1=datetime.datetime.utcnow()
>>> pickled=pickle.dumps(now1)
>>> now2=pickle.loads(pickled)
>>> now1
datetime.datetime(2020, 1, 23, 5, 30, 56, 648873)
>>> now2
datetime.datetime(2020, 1, 23, 5, 30, 56, 648873)

#pickle can also handle unique classes and objects defined in the program.
>>> import pickle
>>> class Tiny():
...     def __str__(self):
...         return "tiny"
... 
>>> obj1=Tiny()
>>> obj1
<__main__.Tiny object at 0x10af86910>
>>> str(obj1)
'tiny'
#pickled is a binary sequence serialized with pickle from an obj1 object.
#dump()Serialize to a file using.
>>> pickled=pickle.dumps(obj1)
>>> pickled
b'\x80\x03c__main__\nTiny\nq\x00)\x81q\x01.'
#I converted it back to obj2 and made a copy of obj1.
#loads()Deserialize objects from files using.
>>> obj2=pickle.loads(pickled)
>>> obj2
<__main__.Tiny object at 0x10b21cdd0>
>>> str(obj2)
'tiny'


Impressions

Finally next to RDBMS.

References

"Introduction to Python3 by Bill Lubanovic (published by O'Reilly Japan)"

"Python Tutorial 3.8.1 Documentation 7. Input and Output" https://docs.python.org/ja/3/tutorial/inputoutput.html#old-string-formatting

Recommended Posts

[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
[Introduction to Python3 Day 12] Chapter 6 Objects and Classes (6.3-6.15)
[Introduction to Python3 Day 22] Chapter 11 Concurrency and Networking (11.1 to 11.3)
[Introduction to Python3 Day 11] Chapter 6 Objects and Classes (6.1-6.2)
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)
[Introduction to Python3 Day 8] Chapter 4 Py Skin: Code Structure (4.1-4.13)
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
[Introduction to Python3 Day 1] Programming and Python
[Introduction to Python3 Day 3] Chapter 2 Py components: Numbers, strings, variables (2.2-2.3.6)
[Introduction to Python3 Day 2] Chapter 2 Py Components: Numbers, Strings, Variables (2.1)
[Introduction to Python3 Day 4] Chapter 2 Py Components: Numbers, Strings, Variables (2.3.7-2.4)
Introduction to Effectiveness Verification Chapter 1 in Python
[Introduction to Data Scientists] Basics of Python ♬
[Introduction to Python3 Day 7] Chapter 3 Py Tools: Lists, Tuples, Dictionaries, Sets (3.3-3.8)
[Introduction to Python3 Day 5] Chapter 3 Py Tools: Lists, Tuples, Dictionaries, Sets (3.1-3.2.6)
[Introduction to Python3 Day 10] Chapter 5 Py's Cosmetic Box: Modules, Packages, Programs (5.4-5.7)
[Introduction to Python3 Day 9] Chapter 5 Py's Cosmetic Box: Modules, Packages, Programs (5.1-5.4)
[Introduction to Python3 Day 6] Chapter 3 Py tool lists, tuples, dictionaries, sets (3.2.7-3.2.19)
Introduction to Python language
Introduction to OpenCV (python)-(2)
Introduction to effectiveness verification Chapter 3 written in Python
[Introduction to Python] How to handle JSON format data
Introduction to Effectiveness Verification Chapter 2 Written in Python
Python for Data Analysis Chapter 4
[Chapter 5] Introduction to Python with 100 knocks of language processing
Reading Note: An Introduction to Data Analysis with Python
Introduction to serial communication [Python]
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Introduction to Python] <list> [edit: 2020/02/22]
Python for Data Analysis Chapter 2
Introduction to Python (Python version APG4b)
An introduction to Python Programming
Introduction to Python For, While
Python for Data Analysis Chapter 3
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Introduction to cx_Oracle] (Part 6) DB and Python data type mapping
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
[Introduction to Data Scientists] Basics of Python ♬ Functions and classes
[Introduction to Python] Combine Nikkei 225 and NY Dow csv data
[Python] Introduction to graph creation using coronavirus data [For beginners]
[Introduction to Python] How to get data with the listdir function
[Introduction to Udemy Python 3 + Application] 58. Lambda
[Introduction to Udemy Python 3 + Application] 31. Comments
[Python] How to FFT mp3 data
Data Scientist Training Course Chapter 2 Day 2
Introduction to Python Numerical Library NumPy
Practice! !! Introduction to Python (Type Hints)
Data Scientist Training Course Chapter 3 Day 3
[Introduction to Python] <numpy ndarray> [edit: 2020/02/22]
Data Scientist Training Course Chapter 4 Day 1
[Introduction to Udemy Python 3 + Application] 57. Decorator
Introduction to Python Hands On Part 1