It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).
main.py
# coding: utf-8
fname = 'hightemp.txt'
lines = open(fname).readlines()
lines.sort(key=lambda line: float(line.split('\t')[2]), reverse=True)
for line in lines:
print(line, end='')
Terminal
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9 2007-08-16
40 Tajimi, Gifu Prefecture.9 2007-08-16
Yamagata 40 Yamagata.8 1933-07-25
Yamanashi Prefecture Kofu 40.7 2013-08-10
Wakayama Prefecture Katsuragi 40.6 1994-08-08
Shizuoka Prefecture Tenryu 40.6 1994-08-04
40 Katsunuma, Yamanashi Prefecture.5 2013-08-10
40 Koshigaya, Saitama Prefecture.4 2007-08-16
Gunma Prefecture Tatebayashi 40.3 2007-08-16
40 Kamisatomi, Gunma Prefecture.3 1998-07-04
Aisai 40, Aichi Prefecture.3 1994-08-05
Chiba Prefecture Ushiku 40.2 2004-07-20
40 Sakuma, Shizuoka Prefecture.2 2001-07-24
40 Uwajima, Ehime Prefecture.2 1927-07-22
40 Sakata, Yamagata Prefecture.1 1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
39 Mobara, Chiba.9 2013-08-11
39 Hatoyama, Saitama Prefecture.9 1997-07-05
Toyonaka 39, Osaka.9 1994-08-08
Yamanashi Prefecture Otsuki 39.9 1990-07-19
39 Tsuruoka, Yamagata Prefecture.9 1978-08-03
Aichi Prefecture Nagoya 39.9 1942-08-02
test.sh
#!/bin/sh
#Sort in reverse order with the third column as a number
sort hightemp.txt --key=3,3 --numeric-sort --reverse > result_test.txt
#Run with a Python program
python main.py > result.txt
#Check the result
diff --report-identical-files result.txt result_test.txt
Terminal
segavvy@ubuntu:~/document/100 language processing knock 2015/18$ ./test.sh
10d9
<Gunma Prefecture Tatebayashi 40.3 2007-08-16
11a11
>Gunma Prefecture Tatebayashi 40.3 2007-08-16
17d16
<Gifu Prefecture Mino 40 2007-08-16
19,20c18
<39 Mobara, Chiba.9 2013-08-11
<39 Hatoyama, Saitama Prefecture.9 1997-07-05
---
>Gifu Prefecture Mino 40 2007-08-16
21a20
>39 Mobara, Chiba.9 2013-08-11
23a23
>39 Hatoyama, Saitama Prefecture.9 1997-07-05
The difference was detected by diff
, but since the temperature in the third column is the same, it can't be helped that there is a difference in that part.
I tried using a lambda expression for the first time this time. This is the part of lambda line: float (line.split ('\ t') [2])
. For the given string line
, it is used instead of the function that returns the third column divided by \ t
as float
. However, it's a little confusing. As with the inclusion notation, I think you are used to this area.
By the way, in the explanation of Small functions and lambda expressions in the Python document, the author He said he liked the style without lambda.
When sorting, you need to pay attention to the type of the target data.
In the target data this time, if you forget to convert to float
with a lambda expression or forget to add --numeric-sort
with the UNIX command sort
, the result will be correct, but if the temperature is 5 degrees If the places are mixed, it will come to the top even though the temperature is the lowest.
That's all for the 19th knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts