100 language processing knocks http://www.cl.ecei.tohoku.ac.jp/nlp100/ From Chapter 2 10 to 19
Count the number of lines. Use the wc command for confirmation.
Bash
$ wc -l hightemp.txt
Python
print(len(open('hightemp.txt').readlines()))
Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.
Bash
$ sed 's/\t/ /g' hightemp.txt
Python
r = open('hightemp.txt').readlines()
print(''.join([l.replace('\t', ' ') for l in r]))
What you commented on
print(open('hightemp.txt').read().replace('\t', ' '))
Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.
Bash
$ cut -f 1 hightemp.txt > col1.txt
$ cut -f 2 hightemp.txt > col2.txt
Python
r = open('hightemp.txt').readlines()
with open('col1.txt', 'w') as c1, open('col2.txt', 'w') as c2:
for l in r:
s = l.split('\t')
c1.write(s[0]+'\n')
c2.write(s[1]+'\n')
Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.
Bash
$ paste col1.txt col2.txt
Python
c1 = open('col1.txt').readlines()
c2 = open('col2.txt').readlines()
for s1, s2 in zip(c1, c2):
print(s1.rstrip() + '\t' + s2.rstrip())
Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.
Bash
$ head -n 5 hightemp.txt
Python
import sys
n = int(sys.argv[1])
r = open('hightemp.txt').readlines()
print(''.join(r[:n]))
Receive the natural number N by means such as a command line argument, and display only the last N lines of the input. Use the tail command for confirmation.
Bash
$ tail -n 5 hightemp.txt
Python
import sys
n = int(sys.argv[1])
r = open('hightemp.txt').readlines()
print(''.join(r[-n:]))
Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.
Bash
$ split -l 5 hightemp.txt
Python
import sys
import math
n = int(sys.argv[1])
r = open('hightemp.txt').readlines()
for i in range(n):
l = math.ceil((len(r)+1) / n)
with open('split0' + str(i) + '.txt', 'w') as f:
f.write(''.join(r[l*i:l*i+l-1]))
Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.
Bash
$ cut -f 1 hightemp.txt | sort | uniq
Python
r = open('hightemp.txt').readlines()
print('\n'.join(set((x.split('\t')[0] for x in r))))
Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).
Bash
$ sort -r -n -k 3,3 hightemp.txt
Python
r = open('hightemp.txt').readlines()
r.sort(key=lambda x: x.split('\t')[2], reverse=True)
print(''.join(r))
Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, sort commands for confirmation
Bash
$ cut -f 1 hightemp.txt | sort | uniq -c | sort -r
Python
r = open('hightemp.txt').readlines()
r = list(map(lambda s: s.split()[0], r))
c = {s: r.count(s) for s in r}
c = sorted(c.items(), key=lambda x: x[1], reverse=True)
print('\n'.join(map(lambda s: str(s[1]) + ' ' + s[0], c)))
What you commented on
r = [s.split('\t')[0] for s in open('hightemp.txt')]
c = {k:r.count(k) for k in r}
s = sorted(c, key=lambda k:c[k], reverse=True)
print('\n'.join(str(c[k])+' '+k for k in s))
Recommended Posts