Chapter 2 from today. This is a continuation of this. Python inexperienced person tries to knock 100 language processing 07-09 https://qiita.com/earlgrey914/items/a7b6781037bc0844744b

When I said "it took 7 hours" in Chapter 1, I was asked "What's your job?" Of course I do.

Preparation

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

Use this hightemp.txt as an input file -Write a Python program that performs processing -Try the same processing (command execution) with UNIX commands That seems to be the content of Chapter 2.

The contents of hightemp.txt look like this. Tab-delimited 24-by-4 data.

`hightemp.txt`


Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
Yamagata 40 Yamagata.8	1933-07-25
Yamanashi Prefecture Kofu 40.7	2013-08-10
Wakayama Prefecture Katsuragi 40.6	1994-08-08
Shizuoka Prefecture Tenryu 40.6	1994-08-04
40 Katsunuma, Yamanashi Prefecture.5	2013-08-10
40 Koshigaya, Saitama Prefecture.4	2007-08-16
Gunma Prefecture Tatebayashi 40.3	2007-08-16
40 Kamisatomi, Gunma Prefecture.3	1998-07-04
Aisai 40, Aichi Prefecture.3	1994-08-05
Chiba Prefecture Ushiku 40.2	2004-07-20
40 Sakuma, Shizuoka Prefecture.2	2001-07-24
40 Uwajima, Ehime Prefecture.2	1927-07-22
40 Sakata, Yamagata Prefecture.1	1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9	2013-08-11
39 Hatoyama, Saitama Prefecture.9	1997-07-05
Toyonaka 39, Osaka.9	1994-08-08
Yamanashi Prefecture Otsuki 39.9	1990-07-19
39 Tsuruoka, Yamagata Prefecture.9	1978-08-03
Aichi Prefecture Nagoya 39.9	1942-08-02

I'm using AWS Cloud9 as the Python execution environment, so Start after uploading this txt file there.

As an aside, Cloud9 is really useful. I'm happy to be able to develop native GUI apps on Cloud9 (I'm saying something strange).

10. Counting the number of lines

Count the number of lines. Use the wc command for confirmation.

Well. First of all, how to read a txt file with Python. I know this. I put .txt in the same place as .py, so this should be okay.

`yomikoku.py`


with open('hightemp.txt') as f:
    s = f.read()
    print(s)

Traceback (most recent call last):
  File "/home/ec2-user/knock/02/enshu11.py", line 6, in <module>
    with open('hightemp.txt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'hightemp.txt'

Oh. It's no good.

~ 3 minutes google ~

Reference URL
https://qiita.com/nagamee/items/b7d1b02074293fdfdfff

`korede.py`


import os.path

#The origin is the location of this py file
os.chdir((os.path.dirname(os.path.abspath(__file__))))

with open('hightemp.txt') as f:
    s = f.read()
    print(s)

Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9    2007-08-16
：

This is OK. This ʻos.chdir ((os.path.dirname (os.path.abspath (__ file__))))) `magic? Is it okay to write in the future? Depending on the execution environment, it may be necessary or unnecessary (or not written) ...

So, there seem to be several ways to read the contents after ʻopen ()the file. In this problem, it says "count the number of lines", so it's better to usereadlines ()` which lists each line.

`enshu10.py`


import os.path

#The origin is the location of this py file
os.chdir((os.path.dirname(os.path.abspath(__file__))))

with open('hightemp.txt') as f:
    s = f.readlines()
    print(len(s))

It's easy. It is said that the same thing should be done with UNIX commands, so execute it.

[ec2-user@ip-172-31-34-215 02]$ wc -l hightemp.txt 
24 hightemp.txt

The file name is in the way. Let's bite cat.

[ec2-user@ip-172-31-34-215 02]$ cat hightemp.txt | wc -l
24

11. Replace tabs with spaces

Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.

Isn't it easier than Chapter 1?

`eunshu11.py`


import os.path

os.chdir((os.path.dirname(os.path.abspath(__file__))))

with open('hightemp.txt', mode="r") as f:
    s = f.read()
    tikango = s.replace("\t", " ") 
    
with open('hightemp.txt', mode="w") as f:
    f.write(tikango)

`hightemp.txt`


Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02

Try replacing it with sed in the terminal as well.

[ec2-user@ip-172-31-34-215 02]$ sed -i -e "s/\t/ /g" hightemp.txt
[ec2-user@ip-172-31-34-215 02]$ cat hightemp.txt 
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-0

12. Save the first column in col1.txt and the second column in col2.txt

Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.

I feel like it's getting easier at once.

`enshu12.py`


import os.path

os.chdir((os.path.dirname(os.path.abspath(__file__))))

with open('hightemp.txt', mode="r") as f:
    linedata = f.readlines()
    for l in linedata:
        with open('col1.txt', mode="a") as c1:
            c1.write(l.split(" ")[0] + "\r")
        with open('col2.txt', mode="a") as c2:
            c2.write(l.split(" ")[1] +"\r")

`col1.txt`


Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture

`col2.txt`


Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya

The cut command looks like this.

[ec2-user@ip-172-31-34-215 02]$ cut -f 1 -d " " hightemp.txt > col1_command.txt 
[ec2-user@ip-172-31-34-215 02]$ cut -f 2 -d " " hightemp.txt > col2_command.txt

Compare with diff ...

[ec2-user@ip-172-31-34-215 02]$ diff col1.txt col1_command.txt 
1c1,24
Aichi prefecture
\ No newline at end of file
---
>Kochi Prefecture
>Saitama
>Gifu Prefecture
>Yamagata Prefecture
>Yamanashi Prefecture
>Wakayama Prefecture
>Shizuoka Prefecture
>Yamanashi Prefecture
>Saitama
>Gunma Prefecture
>Gunma Prefecture
>Aichi prefecture
>Chiba
>Shizuoka Prefecture
>Ehime Prefecture
>Yamagata Prefecture
>Gifu Prefecture
>Gunma Prefecture
>Chiba
>Saitama
>Osaka
>Yamanashi Prefecture
>Yamagata Prefecture
>Aichi prefecture

Are! ?? This is because it is not displayed even with cat col1.txt ... ** Because of the line feed code! ** ** So I changed the line feed code from \ r to \ n and specified ʻUTF-8` as the encoding when writing the file.

`enshu13.py`


import os.path

os.chdir((os.path.dirname(os.path.abspath(__file__))))

with open('hightemp.txt', mode="r") as f:
    linedata = f.readlines()
    for l in linedata:
        with open('col1.txt', mode="a", encoding="utf-8") as c1:
            c1.write(l.split(" ")[0] + "\n")
        with open('col2.txt', mode="a", encoding="utf-8") as c2:
            c2.write(l.split(" ")[1] +"\n")

Execution confirmation

[ec2-user@ip-172-31-34-215 02]$ python3 enshu12.py
[ec2-user@ip-172-31-34-215 02]$ 
[ec2-user@ip-172-31-34-215 02]$ cut -f 1 -d " " hightemp.txt > col1_command.txt
[ec2-user@ip-172-31-34-215 02]$ cut -f 2 -d " " hightemp.txt > col2_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff col1.txt col1_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff col2.txt col2_command.txt
[ec2-user@ip-172-31-34-215 02]$

It's ok.

13. Merge col1.txt and col2.txt

Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.

Maybe it's like this, but is there a better way?

`tabun.py`



with open col1.txt
Put all rows in array 1

with open col2.txt
Put all rows in array 2

for[i]
Output file= write(Array 1[i] + "\t" +Array 2[i])

~ 20 minutes later ~

`enshu13.py`


import os.path

os.chdir((os.path.dirname(os.path.abspath(__file__))))

linedata_col1 = []
linedata_col2 = []

with open('col1.txt', mode="r") as f:
    linedata_col1 = f.read().splitlines()


with open('col2.txt', mode="r") as f:
    linedata_col2 = f.read().splitlines()

with open('merge.txt', mode="a", encoding="utf-8") as f:
    for c1, c2 in zip(linedata_col1, linedata_col2):
        f.write(c1 + "\t" + c2 + "\n")

`merge.txt`


Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
Wakayama Prefecture Katsuragi
Shizuoka Prefecture Tenryu
Yamanashi Prefecture Katsunuma
Koshigaya, Saitama Prefecture
Gunma Prefecture Tatebayashi
Kamisatomi, Gunma Prefecture
Aisai, Aichi Prefecture
Chiba Prefecture Ushiku
Sakuma, Shizuoka Prefecture
Uwajima, Ehime Prefecture
Yamagata Prefecture Sakata
Gifu Prefecture Mino
Gunma Prefecture Maebashi
Mobara, Chiba
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya

The point of ingenuity is linedata_col1 = f.read (). Splitlines (). ** It is ant to read line by line with f.readlines (), but then it will be a list including line feed code like ↓. ** **

`readlinesdato.py`


with open('col1.txt', mode="r") as f:
    linedata_col1 = f.readlines()
    print(linedata_col1)

['Kochi Prefecture\n', 'Saitama\n', 'Gifu Prefecture\n', 'Yamagata Prefecture\n', 'Yamanashi Prefecture\n', 'Wakayama Prefecture\n', 'Shizuoka Prefecture\n', 'Yamanashi Prefecture\n', 'Saitama\n', 'Gunma Prefecture\n', 'Gunma Prefecture\n', 'Aichi prefecture\n', 'Chiba\n', 'Shizuoka Prefecture\n', 'Ehime Prefecture\n', 'Yamagata Prefecture\n', 'Gifu Prefecture\n', 'Gunma Prefecture\n', 'Chiba\n', 'Saitama\n', 'Osaka\n', 'Yamanashi Prefecture\n', 'Yamagata Prefecture\n', 'Aichi prefecture\n']

I thought it would be best to use ** read () to read it as a block object including the line feed code, and to list it with split () with the line feed code, rather than to bother to erase this line feed code.

Then compare with paste.

[ec2-user@ip-172-31-34-215 02]$ python3 enshu13.py
[ec2-user@ip-172-31-34-215 02]$ paste col1.txt col2.txt > merge_command.txt
[ec2-user@ip-172-31-34-215 02]$ diff merge.txt merge_command.txt 
[ec2-user@ip-172-31-34-215 02]$

It's kind of easy, and the result verification has become troublesome because the file is sandwiched. Let's continue tomorrow ~ ** It took 2 hours so far! !! ** I'm doing it lazily, so I wonder if it will be very helpful this time.

Python inexperienced person tries to knock 100 language processing 10 ~ 13

Preparation

hightemp.txt

10. Counting the number of lines

Count the number of lines. Use the wc command for confirmation.

yomikoku.py

korede.py

enshu10.py

11. Replace tabs with spaces

Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.

eunshu11.py

hightemp.txt

12. Save the first column in col1.txt and the second column in col2.txt

Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.

enshu12.py

col1.txt

col2.txt

enshu13.py

13. Merge col1.txt and col2.txt

Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.

tabun.py

enshu13.py

merge.txt

readlinesdato.py

`hightemp.txt`

`yomikoku.py`

`korede.py`

`enshu10.py`

`eunshu11.py`

`hightemp.txt`

`enshu12.py`

`col1.txt`

`col2.txt`

`enshu13.py`

`tabun.py`

`enshu13.py`

`merge.txt`

`readlinesdato.py`