100 language processing knocks http://www.cl.ecei.tohoku.ac.jp/nlp100/ From Chapter 2 10 to 19

10. Counting the number of lines

Count the number of lines. Use the wc command for confirmation.

`Bash`


$ wc -l hightemp.txt

`Python`


print(len(open('hightemp.txt').readlines()))

11. Replace tabs with spaces

Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.

`Bash`


$ sed 's/\t/ /g' hightemp.txt

`Python`


r = open('hightemp.txt').readlines()
print(''.join([l.replace('\t', ' ') for l in r]))

`What you commented on`


print(open('hightemp.txt').read().replace('\t', ' '))

12. Save the first column in col1.txt and the second column in col2.txt

Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.

`Bash`


$ cut -f 1 hightemp.txt > col1.txt
$ cut -f 2 hightemp.txt > col2.txt

`Python`


r = open('hightemp.txt').readlines()
with open('col1.txt', 'w') as c1, open('col2.txt', 'w') as c2:
    for l in r:
        s = l.split('\t')
        c1.write(s[0]+'\n')
        c2.write(s[1]+'\n')

13. Merge col1.txt and col2.txt

Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.

`Bash`


$ paste col1.txt col2.txt

`Python`


c1 = open('col1.txt').readlines()
c2 = open('col2.txt').readlines()
for s1, s2 in zip(c1, c2):
    print(s1.rstrip() + '\t' + s2.rstrip())

14. Output N lines from the beginning

Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.

`Bash`


$ head -n 5 hightemp.txt

`Python`


import sys

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
print(''.join(r[:n]))

15. Output the last N lines

Receive the natural number N by means such as a command line argument, and display only the last N lines of the input. Use the tail command for confirmation.

`Bash`


$ tail -n 5 hightemp.txt

`Python`


import sys

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
print(''.join(r[-n:]))

16. Divide the file into N

Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.

`Bash`


$ split -l 5 hightemp.txt

`Python`


import sys
import math

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
for i in range(n):
    l = math.ceil((len(r)+1) / n)
    with open('split0' + str(i) + '.txt', 'w') as f:
        f.write(''.join(r[l*i:l*i+l-1]))

17. Difference in the character string in the first column

Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.

`Bash`


$ cut -f 1 hightemp.txt | sort | uniq

`Python`


r = open('hightemp.txt').readlines()
print('\n'.join(set((x.split('\t')[0] for x in r))))

18. Sort each row in descending order of the numbers in the third column

Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).

`Bash`


$ sort -r -n -k 3,3 hightemp.txt

`Python`


r = open('hightemp.txt').readlines()
r.sort(key=lambda x: x.split('\t')[2], reverse=True)
print(''.join(r))

19. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, sort commands for confirmation

`Bash`


$ cut -f 1 hightemp.txt | sort | uniq -c | sort -r

`Python`


r = open('hightemp.txt').readlines()
r = list(map(lambda s: s.split()[0], r))
c = {s: r.count(s) for s in r}
c = sorted(c.items(), key=lambda x: x[1], reverse=True)
print('\n'.join(map(lambda s: str(s[1]) + ' ' + s[0], c)))

`What you commented on`


r = [s.split('\t')[0] for s in open('hightemp.txt')]
c = {k:r.count(k) for k in r}
s = sorted(c, key=lambda k:c[k], reverse=True)
print('\n'.join(str(c[k])+' '+k for k in s))

100 Language Processing Knock Chapter 2 (Python)

10. Counting the number of lines

Bash

Python

11. Replace tabs with spaces

Bash

Python

What you commented on

12. Save the first column in col1.txt and the second column in col2.txt

Bash

Python

13. Merge col1.txt and col2.txt

Bash

Python

14. Output N lines from the beginning

Bash

Python

15. Output the last N lines

Bash

Python

16. Divide the file into N

Bash

Python

17. Difference in the character string in the first column

Bash

Python

18. Sort each row in descending order of the numbers in the third column

Bash

Python

19. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

Bash

Python

What you commented on

`Bash`

`Python`

`Bash`

`Python`

`What you commented on`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`Bash`

`Python`

`What you commented on`