100 Language Processing Knock Chapter 2 (Python)

100 language processing knocks http://www.cl.ecei.tohoku.ac.jp/nlp100/ From Chapter 2 10 to 19

10. Counting the number of lines

Count the number of lines. Use the wc command for confirmation.

Bash


$ wc -l hightemp.txt

Python


print(len(open('hightemp.txt').readlines()))

11. Replace tabs with spaces

Replace each tab character with one space character. Use the sed command, tr command, or expand command for confirmation.

Bash


$ sed 's/\t/ /g' hightemp.txt

Python


r = open('hightemp.txt').readlines()
print(''.join([l.replace('\t', ' ') for l in r]))

What you commented on


print(open('hightemp.txt').read().replace('\t', ' '))

12. Save the first column in col1.txt and the second column in col2.txt

Save only the first column of each row as col1.txt and the second column as col2.txt. Use the cut command for confirmation.

Bash


$ cut -f 1 hightemp.txt > col1.txt
$ cut -f 2 hightemp.txt > col2.txt

Python


r = open('hightemp.txt').readlines()
with open('col1.txt', 'w') as c1, open('col2.txt', 'w') as c2:
    for l in r:
        s = l.split('\t')
        c1.write(s[0]+'\n')
        c2.write(s[1]+'\n')

13. Merge col1.txt and col2.txt

Combine the col1.txt and col2.txt created in 12, and create a text file in which the first and second columns of the original file are arranged by tab delimiters. Use the paste command for confirmation.

Bash


$ paste col1.txt col2.txt

Python


c1 = open('col1.txt').readlines()
c2 = open('col2.txt').readlines()
for s1, s2 in zip(c1, c2):
    print(s1.rstrip() + '\t' + s2.rstrip())

14. Output N lines from the beginning

Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.

Bash


$ head -n 5 hightemp.txt

Python


import sys

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
print(''.join(r[:n]))

15. Output the last N lines

Receive the natural number N by means such as a command line argument, and display only the last N lines of the input. Use the tail command for confirmation.

Bash


$ tail -n 5 hightemp.txt

Python


import sys

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
print(''.join(r[-n:]))

16. Divide the file into N

Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.

Bash


$ split -l 5 hightemp.txt

Python


import sys
import math

n = int(sys.argv[1])

r = open('hightemp.txt').readlines()
for i in range(n):
    l = math.ceil((len(r)+1) / n)
    with open('split0' + str(i) + '.txt', 'w') as f:
        f.write(''.join(r[l*i:l*i+l-1]))

17. Difference in the character string in the first column

Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.

Bash


$ cut -f 1 hightemp.txt | sort | uniq

Python


r = open('hightemp.txt').readlines()
print('\n'.join(set((x.split('\t')[0] for x in r))))

18. Sort each row in descending order of the numbers in the third column

Arrange each row in the reverse order of the numbers in the third column (Note: sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).

Bash


$ sort -r -n -k 3,3 hightemp.txt

Python


r = open('hightemp.txt').readlines()
r.sort(key=lambda x: x.split('\t')[2], reverse=True)
print(''.join(r))

19. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, sort commands for confirmation

Bash


$ cut -f 1 hightemp.txt | sort | uniq -c | sort -r

Python


r = open('hightemp.txt').readlines()
r = list(map(lambda s: s.split()[0], r))
c = {s: r.count(s) for s in r}
c = sorted(c.items(), key=lambda x: x[1], reverse=True)
print('\n'.join(map(lambda s: str(s[1]) + ' ' + s[0], c)))

What you commented on


r = [s.split('\t')[0] for s in open('hightemp.txt')]
c = {k:r.count(k) for k in r}
s = sorted(c, key=lambda k:c[k], reverse=True)
print('\n'.join(str(c[k])+' '+k for k in s))

Recommended Posts

100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock Chapter 1 by Python
100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 Language Processing with Python Knock 2015
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
I tried 100 language processing knock 2020: Chapter 1
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 Language Processing Knock 2020 Chapter 3: Regular Expressions
100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 language processing knock 2020 [00 ~ 49 answer]
Python: Natural language processing
100 Language Processing Knock-52: Stemming
100 language processing knocks ~ Chapter 1
100 language processing knocks Chapter 2 (10 ~ 19)
100 Amateur Language Processing Knock: 09
[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
Python inexperienced person tries to knock 100 language processing 14-16
100 Language Processing Knock UNIX Commands Learned in Chapter 2
100 Language Processing Knock Regular Expressions Learned in Chapter 3
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing Knock-51: Word Clipping
100 Language Processing Knock-58: Tuple Extraction
100 language processing knock-50: sentence break
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity