[PYTHON] 100 amateur language processing knocks: 12

It is a challenge record of Language processing 100 knock 2015. The environment is Ubuntu 16.04 LTS + Python 3.5.2 : : Anaconda 4.1.1 (64-bit). Click here for a list of past knocks (http://qiita.com/segavvy/items/fb50ba8097d59475f760).

Chapter 2: UNIX Command Basics

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

12. Save the first column to col1.txt and the second column to col2.txt

Save the extracted version of only the first column of each row as col1.txt and the extracted version of only the second column as col2.txt. Use the cut command for confirmation.

The finished code:

main.py


# coding: utf-8

fname = 'hightemp.txt'
with open(fname) as data_file, \
		open('col1.txt', mode='w') as col1_file, \
		open('col2.txt', mode='w') as col2_file:
	for line in data_file:
		cols = line.split('\t')
		col1_file.write(cols[0] + '\n')
		col2_file.write(cols[1] + '\n')

Execution result:

col1.txt


Kochi Prefecture
Saitama
Gifu Prefecture
Yamagata Prefecture
Yamanashi Prefecture
Wakayama Prefecture
Shizuoka Prefecture
Yamanashi Prefecture
Saitama
Gunma Prefecture
Gunma Prefecture
Aichi prefecture
Chiba
Shizuoka Prefecture
Ehime Prefecture
Yamagata Prefecture
Gifu Prefecture
Gunma Prefecture
Chiba
Saitama
Osaka
Yamanashi Prefecture
Yamagata Prefecture
Aichi prefecture

col2.txt


Ekawasaki
Kumagaya
Tajimi
Yamagata
Kofu
Katsuragi
Tenryu
Katsunuma
Koshigaya
Tatebayashi
Kamisatomi
Aisai
Ushiku
Sakuma
Uwajima
Sakata
Mino
Maebashi
Mobara
Hatoyama
Toyonaka
Otsuki
Tsuruoka
Nagoya

The UNIX command was confirmed by a shell script.

test.sh


#!/bin/sh

#Extraction and comparison of col1
cut --fields=1 hightemp.txt > col1_test.txt
diff --report-identical-files col1.txt col1_test.txt

#Extraction and comparison of col2
cut --fields=2 hightemp.txt > col2_test.txt
diff --report-identical-files col2.txt col2_test.txt

Execution result:

Terminal


File col1.txt and col1_test.txt is the same
File col2.txt and col2_test.txt is the same

The result was the same.

UNIX command short and long options

UNIX command options are often paired with short and long ones. The longer one is easier to understand, so this knock uses the longer one. However, it is easier to input if you remember the short ones that you use often.

The short options for the command used this time are: Please check man for details.

command Options used this time Short option meaning
cut --fields -f Field number to cut out
diff --report-identical-files -s Report when the comparison result is the same

That's all for the 13th knock. If you have any mistakes, I would appreciate it if you could point them out.

Recommended Posts

100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 24
100 amateur language processing knocks: 50
100 amateur language processing knocks: 70
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 amateur language processing knocks: 92
100 amateur language processing knocks: 30
100 amateur language processing knocks: 06
100 amateur language processing knocks: 84
100 amateur language processing knocks: 81
100 amateur language processing knocks: 33
100 amateur language processing knocks: 46
100 amateur language processing knocks: 88
100 amateur language processing knocks: 89
100 amateur language processing knocks: 40
100 amateur language processing knocks: 45
100 amateur language processing knocks: 43
100 amateur language processing knocks: 55
100 amateur language processing knocks: 22
100 amateur language processing knocks: 61
100 amateur language processing knocks: 94
100 amateur language processing knocks: 54
100 amateur language processing knocks: 04
100 amateur language processing knocks: 63
100 amateur language processing knocks: 78
100 amateur language processing knocks: 12
100 amateur language processing knocks: 14
100 amateur language processing knocks: 08
100 amateur language processing knocks: 42
100 amateur language processing knocks: 19
100 amateur language processing knocks: 73
100 amateur language processing knocks: 75
100 amateur language processing knocks: 98
100 amateur language processing knocks: 83
100 amateur language processing knocks: 95
100 amateur language processing knocks: 32
100 amateur language processing knocks: 96
100 amateur language processing knocks: 87
100 amateur language processing knocks: 72
100 amateur language processing knocks: 79
100 amateur language processing knocks: 23
100 amateur language processing knocks: 05
100 amateur language processing knocks: 00
100 amateur language processing knocks: 02
100 amateur language processing knocks: 37
100 amateur language processing knocks: 21
100 amateur language processing knocks: 68
100 amateur language processing knocks: 11
100 amateur language processing knocks: 90
100 amateur language processing knocks: 74
100 amateur language processing knocks: 66
100 amateur language processing knocks: 28
100 amateur language processing knocks: 64
100 amateur language processing knocks: 34
100 amateur language processing knocks: 36
100 amateur language processing knocks: 77
100 amateur language processing knocks: 01
100 amateur language processing knocks: 16
100 amateur language processing knocks: 27