It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).
hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.
Save the extracted version of only the first column of each row as col1.txt and the extracted version of only the second column as col2.txt. Use the cut command for confirmation.
The finished code:
main.py
# coding: utf-8
fname = 'hightemp.txt'
with open(fname) as data_file, \
open('col1.txt', mode='w') as col1_file, \
open('col2.txt', mode='w') as col2_file:
for line in data_file:
cols = line.split('\t')
col1_file.write(cols[0] + '\n')
col2_file.write(cols[1] + '\n')
Execution result:
col1.txt
Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture
col2.txt
Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya
The UNIX command was confirmed by a shell script.
test.sh
#!/bin/sh
#Extraction and comparison of col1
cut --fields=1 hightemp.txt > col1_test.txt
diff --report-identical-files col1.txt col1_test.txt
#Extraction and comparison of col2
cut --fields=2 hightemp.txt > col2_test.txt
diff --report-identical-files col2.txt col2_test.txt
Execution result:
Terminal
File col1.txt and col1_test.txt is the same
File col2.txt and col2_test.txt is the same
The result was the same.
UNIX command options are often paired with short and long ones. The longer one is easier to understand, so this knock uses the longer one. However, it is easier to input if you remember the short ones that you use often.
The short options for the command used this time are: Please check man
for details.
command | Options used this time | Short option | meaning |
---|---|---|---|
cut | --fields | -f | Field number to cut out |
diff | --report-identical-files | -s | Report when the comparison result is the same |
That's all for the 13th knock. If you have any mistakes, I would appreciate it if you could point them out.
Recommended Posts