Chapter 2 from today. This is a continuation of this. Python inexperienced person tries to knock 100 language processing 07-09 https://qiita.com/earlgrey914/items/a7b6781037bc0844744b
When I said "it took 7 hours" in Chapter 1, I was asked "What's your job?" Of course I do.
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Use this hightemp.txt as an input file -Write a Python program that performs processing -Try the same processing (command execution) with UNIX commands That seems to be the content of Chapter 2.
The contents of hightemp.txt look like this. Tab-delimited 24-by-4 data.
hightemp.txt
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
I'm using AWS Cloud9 as the Python execution environment, so Start after uploading this txt file there.
As an aside, Cloud9 is really useful. I'm happy to be able to develop native GUI apps on Cloud9 (I'm saying something strange).
Well. First of all, how to read a txt file with Python. I know this. I put .txt in the same place as .py, so this should be okay.
yomikoku.py
with open('hightemp.txt') as f:
s = f.read()
print(s)
Traceback (most recent call last):
File "/home/ec2-user/knock/02/enshu11.py", line 6, in <module>
with open('hightemp.txt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'hightemp.txt'
Oh. It's no good.
~ 3 minutes google ~
Reference URL
https://qiita.com/nagamee/items/b7d1b02074293fdfdfff
korede.py
import os.path
#The origin is the location of this py file
os.chdir((os.path.dirname(os.path.abspath(__file__))))
with open('hightemp.txt') as f:
s = f.read()
print(s)
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
:
This is OK. This ʻos.chdir ((os.path.dirname (os.path.abspath (__ file__))))) `magic? Is it okay to write in the future? Depending on the execution environment, it may be necessary or unnecessary (or not written) ...
So, there seem to be several ways to read the contents after ʻopen ()the file. In this problem, it says "count the number of lines", so it's better to use
readlines ()` which lists each line.
enshu10.py
import os.path
#The origin is the location of this py file
os.chdir((os.path.dirname(os.path.abspath(__file__))))
with open('hightemp.txt') as f:
s = f.readlines()
print(len(s))
24
It's easy. It is said that the same thing should be done with UNIX commands, so execute it.
[ec2-user@ip-172-31-34-215 02]$ wc -l hightemp.txt
24 hightemp.txt
The file name is in the way. Let's bite cat.
[ec2-user@ip-172-31-34-215 02]$ cat hightemp.txt | wc -l
24
Isn't it easier than Chapter 1?
eunshu11.py
import os.path
os.chdir((os.path.dirname(os.path.abspath(__file__))))
with open('hightemp.txt', mode="r") as f:
s = f.read()
tikango = s.replace("\t", " ")
with open('hightemp.txt', mode="w") as f:
f.write(tikango)
hightemp.txt
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
Try replacing it with sed in the terminal as well.
[ec2-user@ip-172-31-34-215 02]$ sed -i -e "s/\t/ /g" hightemp.txt
[ec2-user@ip-172-31-34-215 02]$ cat hightemp.txt
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-0
I feel like it's getting easier at once.
enshu12.py
import os.path
os.chdir((os.path.dirname(os.path.abspath(__file__))))
with open('hightemp.txt', mode="r") as f:
linedata = f.readlines()
for l in linedata:
with open('col1.txt', mode="a") as c1:
c1.write(l.split(" ")[0] + "\r")
with open('col2.txt', mode="a") as c2:
c2.write(l.split(" ")[1] +"\r")
col1.txt
Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture
col2.txt
Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya
The cut command looks like this.
[ec2-user@ip-172-31-34-215 02]$ cut -f 1 -d " " hightemp.txt > col1_command.txt
[ec2-user@ip-172-31-34-215 02]$ cut -f 2 -d " " hightemp.txt > col2_command.txt
Compare with diff ...
[ec2-user@ip-172-31-34-215 02]$ diff col1.txt col1_command.txt
1c1,24
Aichi prefecture
\ No newline at end of file
---
>Kochi Prefecture
>Saitama
>Gifu Prefecture
>Yamagata Prefecture
>Yamanashi Prefecture
>Wakayama Prefecture
>Shizuoka Prefecture
>Yamanashi Prefecture
>Saitama
>Gunma Prefecture
>Gunma Prefecture
>Aichi prefecture
>Chiba
>Shizuoka Prefecture
>Ehime Prefecture
>Yamagata Prefecture
>Gifu Prefecture
>Gunma Prefecture
>Chiba
>Saitama
>Osaka
>Yamanashi Prefecture
>Yamagata Prefecture
>Aichi prefecture
Are! ??
This is because it is not displayed even with cat col1.txt
...
** Because of the line feed code! ** **
So I changed the line feed code from \ r
to \ n
and specified ʻUTF-8` as the encoding when writing the file.
enshu13.py
import os.path
os.chdir((os.path.dirname(os.path.abspath(__file__))))
with open('hightemp.txt', mode="r") as f:
linedata = f.readlines()
for l in linedata:
with open('col1.txt', mode="a", encoding="utf-8") as c1:
c1.write(l.split(" ")[0] + "\n")
with open('col2.txt', mode="a", encoding="utf-8") as c2:
c2.write(l.split(" ")[1] +"\n")
Execution confirmation
[ec2-user@ip-172-31-34-215 02]$ python3 enshu12.py
[ec2-user@ip-172-31-34-215 02]$
[ec2-user@ip-172-31-34-215 02]$ cut -f 1 -d " " hightemp.txt > col1_command.txt
[ec2-user@ip-172-31-34-215 02]$ cut -f 2 -d " " hightemp.txt > col2_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff col1.txt col1_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff col2.txt col2_command.txt
[ec2-user@ip-172-31-34-215 02]$
It's ok.
Maybe it's like this, but is there a better way?
tabun.py
with open col1.txt
Put all rows in array 1
with open col2.txt
Put all rows in array 2
for[i]
Output file= write(Array 1[i] + "\t" +Array 2[i])
~ 20 minutes later ~
enshu13.py
import os.path
os.chdir((os.path.dirname(os.path.abspath(__file__))))
linedata_col1 = []
linedata_col2 = []
with open('col1.txt', mode="r") as f:
linedata_col1 = f.read().splitlines()
with open('col2.txt', mode="r") as f:
linedata_col2 = f.read().splitlines()
with open('merge.txt', mode="a", encoding="utf-8") as f:
for c1, c2 in zip(linedata_col1, linedata_col2):
f.write(c1 + "\t" + c2 + "\n")
merge.txt
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
Wakayama Prefecture Katsuragi
Shizuoka Prefecture Tenryu
Yamanashi Prefecture Katsunuma
Koshigaya, Saitama Prefecture
Gunma Prefecture Tatebayashi
Kamisatomi, Gunma Prefecture
Aisai, Aichi Prefecture
Chiba Prefecture Ushiku
Sakuma, Shizuoka Prefecture
Uwajima, Ehime Prefecture
Yamagata Prefecture Sakata
Gifu Prefecture Mino
Gunma Prefecture Maebashi
Mobara, Chiba
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya
The point of ingenuity is linedata_col1 = f.read (). Splitlines ()
.
** It is ant to read line by line with f.readlines ()
, but then it will be a list including line feed code like ↓. ** **
readlinesdato.py
with open('col1.txt', mode="r") as f:
linedata_col1 = f.readlines()
print(linedata_col1)
['Kochi Prefecture\n', 'Saitama\n', 'Gifu Prefecture\n', 'Yamagata Prefecture\n', 'Yamanashi Prefecture\n', 'Wakayama Prefecture\n', 'Shizuoka Prefecture\n', 'Yamanashi Prefecture\n', 'Saitama\n', 'Gunma Prefecture\n', 'Gunma Prefecture\n', 'Aichi prefecture\n', 'Chiba\n', 'Shizuoka Prefecture\n', 'Ehime Prefecture\n', 'Yamagata Prefecture\n', 'Gifu Prefecture\n', 'Gunma Prefecture\n', 'Chiba\n', 'Saitama\n', 'Osaka\n', 'Yamanashi Prefecture\n', 'Yamagata Prefecture\n', 'Aichi prefecture\n']
I thought it would be best to use ** read ()
to read it as a block object including the line feed code, and to list it with split ()
with the line feed code, rather than to bother to erase this line feed code.
Then compare with paste.
[ec2-user@ip-172-31-34-215 02]$ python3 enshu13.py
[ec2-user@ip-172-31-34-215 02]$ paste col1.txt col2.txt > merge_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff merge.txt merge_command.txt
[ec2-user@ip-172-31-34-215 02]$
It's kind of easy, and the result verification has become troublesome because the file is sandwiched. Let's continue tomorrow ~ ** It took 2 hours so far! !! ** I'm doing it lazily, so I wonder if it will be very helpful this time.
Recommended Posts