It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
The finished code:
main.py
# coding: utf-8
fname = 'hightemp.txt'
with open(fname) as data_file:
for line in data_file:
print(line.replace('\t', ' '), end='')
Execution result:
Terminal
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
The UNIX command was confirmed by a shell script.
test.sh
#!/bin/sh
#sed s command: s/Search pattern/Replacement string/g (replace all)
sed 's/\t/ /g' hightemp.txt
tr
and ʻexpand` commands.test2.sh
#!/bin/sh
#tr command
tr '\t' ' ' < hightemp.txt
test3.sh
#!/bin/sh
#expand command
expand --tabs=1 hightemp.txt
Execution result:
Terminal
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
The result was the same.
Since the target file this time is small, it may be possible to read it all at once and replace it. However, since sed
is listed at the top of the command used for confirmation, I thought that I would like you to learn stream processing as the intention of the question. Therefore, it is implemented by reading it little by little.
sed
commandThe sed
command is quite complicated, and I couldn't understand it at all with man sed
, but Wikipedia explanation: sed (computer) I got an overview with.
The sed
command used this time specifies the search pattern with a regular expression, but the \ t
indicating the tab happens to be the same as the regular expression, so I could replace it without knowing the regular expression. Regular expressions are the subject of Chapter 3 of this knock, so we'll learn there.
tr
commandtr
command will replace it if you specify two characters. Since the target data needs to be given from the standard input, it is redirected with <
. Please check man
for details.command converts tabs to blanks. By default, the width of the tab is set to 8 characters and converted to blank, so the
tabsoption is set to the width of 1 character. Please check
man` for details.That's all for the 12th knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts