Python inexperienced person tries to knock 100 language processing 14-16

If you work seriously, you will skip posts, If you post seriously, you will skip work.

This is a difficult problem ... (serious) (I don't do it on holidays)

This is a continuation of this. Python inexperienced person tries to knock 100 language processing 10 ~ 13 https://qiita.com/earlgrey914/items/afdb6458a2c9f1c00c2e


14. Output N lines from the beginning

Receive the natural number N by means such as command line arguments, and display only the first N lines of the input. Use the head command for confirmation.

It seems that you can write the following to get command line arguments.

arg.py


import sys
args = sys.argv


Reference URL
https://qiita.com/orange_u/items/3f0fb6044fd5ee2c3a37

The reason why args [1] and the index start from 1 can be understood by looking at the execution result. As you can see from the execution result, the values are returned as a list

Hmmmm. ** Command line arguments are returned as a list, counting from 1. ** **

Well, the problem statement is a little esoteric, but if "2" is passed as a command line argument, I wonder if I should output two lines of merge.txt.

How about this?

otto.py


import os.path
import sys

os.chdir((os.path.dirname(os.path.abspath(__file__))))

args = sys.argv
linedata = []

with open('merge.txt', mode="r") as f:
    linedata = f.read().splitlines()
    print(linedata[args[1])])
[ec2-user@ip-172-31-34-215 02]$ python3 enshu14.py 2
Traceback (most recent call last):
  File "enshu14.py", line 11, in <module>
    print(linedata[args[1]])
TypeError: list indices must be integers or slices, not str

Oops, the list of ** command line arguments ʻargs` seems to be returned as a string. ** **

Input value is treated as a string type

Then, character string → integer conversion. ʻInt (args [1]) `isn't it?

enshu14.py


import os.path
import sys

os.chdir((os.path.dirname(os.path.abspath(__file__))))

args = sys.argv
linedata = []

with open('merge.txt', mode="r") as f:
    linedata = f.read().splitlines()

for i in range(int(args[1])):
    print(linedata[i])

Verify by setting the argument to 2 or 5.

[ec2-user@ip-172-31-34-215 02]$ python3 enshu14.py 2
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
[ec2-user@ip-172-31-34-215 02]$ python3 enshu14.py 5
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
[ec2-user@ip-172-31-34-215 02]$ 

Well easy. Compare with head.

[ec2-user@ip-172-31-34-215 02]$ head -n 2 merge.txt
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
[ec2-user@ip-172-31-34-215 02]$ head -n 5 merge.txt
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture
Gifu Prefecture Tajimi
Yamagata Prefecture Yamagata
Yamanashi Prefecture Kofu
[ec2-user@ip-172-31-34-215 02]$ 

~ 5 days skipping ~

Output the last N lines

Receive the natural number N by means such as a command line argument, and display only the last N lines of the input. Use the tail command for confirmation.

To refer to the list from the end, it seems that you should refer to [-1], [-2], [-3], ....


Reference URL
https://qiita.com/komeiy/items/971ead35d33c25923222

What should I do with this output result? If there is a list of [" a "," b "," c "," d "] and the argument is 2, should it be output as c, d? Is it d, c? It is said that you should check with tail, so let's first check how it is output with tail.

[ec2-user@ip-172-31-34-215 02]$ tail -n 2 merge.txt
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya
[ec2-user@ip-172-31-34-215 02]$ tail -n 5 merge.txt
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya

I see. You should put out in the order of c, d.

enshu15.py


import os.path
import sys

os.chdir((os.path.dirname(os.path.abspath(__file__))))

args = sys.argv
linedata = []

with open('merge.txt', mode="r") as f:
    linedata = f.read().splitlines()

linedata_reverse =[] 
for i in range(int(args[1])):
    linedata_reverse.append(linedata[-i-1])
    
for i in (reversed(linedata_reverse)):
    print(i)

Uh. It's done. The reversal of the list is reversed ().

[ec2-user@ip-172-31-34-215 02]$ python3 enshu15.py 2
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya
[ec2-user@ip-172-31-34-215 02]$ python3 enshu15.py 5
Hatoyama, Saitama Prefecture
Toyonaka, Osaka
Yamanashi Prefecture Otsuki
Yamagata Prefecture Tsuruoka
Aichi Prefecture Nagoya

The result is also OK.

** I feel that Python has many useful standard functions such as flipping lists and slicing. ** **

16. Divide the file into N

Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command.

First, let's try what the split command looks like. ** As you may have noticed, I'm new to Python and new to UNIX commands. ** **

[ec2-user@ip-172-31-34-215 02]$ split -l 2 -d merge.txt test
[ec2-user@ip-172-31-34-215 02]$ ll
total 112
-rw-rw-r-- 1 ec2-user ec2-user 435 Mar 19 17:12 merge.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test00
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test01
-rw-rw-r-- 1 ec2-user ec2-user  43 Mar 25 15:48 test02
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 25 15:48 test03
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 25 15:48 test04
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test05
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test06
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test07
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 25 15:48 test08
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 25 15:48 test09
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 25 15:48 test10
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 25 15:48 test11
[ec2-user@ip-172-31-34-215 02]$ cat test00 
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture

I see. First, let's make the "N division for each line" part.

enshu16.py


import os.path
import sys

os.chdir((os.path.dirname(os.path.abspath(__file__))))

args = sys.argv
a = int(args[1])
linedata = []

with open('merge.txt', mode="r") as f:
    linedata = f.read().splitlines()

for i in range(0, len(linedata), a):
    print(linedata[i:i+a:])

For the time being, I was able to display the list as if it were split.

[ec2-user@ip-172-31-34-215 02]$ python3 enshu16.py 1
['Kochi Prefecture\t Ekawasaki']
['Saitama\t Kumagaya']
['Gifu Prefecture\t Tajimi']
['Yamagata Prefecture\t Yamagata']
['Yamanashi Prefecture\t Kofu']
['Wakayama Prefecture\t wig']
['Shizuoka Prefecture\t Tenryu']
['Yamanashi Prefecture\t Katsunuma']
['Saitama\t Koshigaya']
['Gunma Prefecture\t Tatebayashi']
['Gunma Prefecture\t Kamisatomi']
['Aichi prefecture\t Aisai']
['Chiba\t Ushiku']
['Shizuoka Prefecture\t Sakuma']
['Ehime Prefecture\t Uwajima']
['Yamagata Prefecture\t sakata']
['Gifu Prefecture\t Mino']
['Gunma Prefecture\t Maebashi']
['Chiba\t Mobara']
['Saitama\t Hatoyama']
['Osaka\t Toyonaka']
['Yamanashi Prefecture\t otsuki']
['Yamagata Prefecture\t Tsuruoka']
['Aichi prefecture\t Nagoya']
[ec2-user@ip-172-31-34-215 02]$ python3 enshu16.py 2
['Kochi Prefecture\t Ekawasaki', 'Saitama\t Kumagaya']
['Gifu Prefecture\t Tajimi', 'Yamagata Prefecture\t Yamagata']
['Yamanashi Prefecture\t Kofu', 'Wakayama Prefecture\t wig']
['Shizuoka Prefecture\t Tenryu', 'Yamanashi Prefecture\t Katsunuma']
['Saitama\t Koshigaya', 'Gunma Prefecture\t Tatebayashi']
['Gunma Prefecture\t Kamisatomi', 'Aichi prefecture\t Aisai']
['Chiba\t Ushiku', 'Shizuoka Prefecture\t Sakuma']
['Ehime Prefecture\t Uwajima', 'Yamagata Prefecture\t sakata']
['Gifu Prefecture\t Mino', 'Gunma Prefecture\t Maebashi']
['Chiba\t Mobara', 'Saitama\t Hatoyama']
['Osaka\t Toyonaka', 'Yamanashi Prefecture\t otsuki']
['Yamagata Prefecture\t Tsuruoka', 'Aichi prefecture\t Nagoya']
[ec2-user@ip-172-31-34-215 02]$ python3 enshu16.py 5
['Kochi Prefecture\t Ekawasaki', 'Saitama\t Kumagaya', 'Gifu Prefecture\t Tajimi', 'Yamagata Prefecture\t Yamagata', 'Yamanashi Prefecture\t Kofu']
['Wakayama Prefecture\t wig', 'Shizuoka Prefecture\t Tenryu', 'Yamanashi Prefecture\t Katsunuma', 'Saitama\t Koshigaya', 'Gunma Prefecture\t Tatebayashi']
['Gunma Prefecture\t Kamisatomi', 'Aichi prefecture\t Aisai', 'Chiba\t Ushiku', 'Shizuoka Prefecture\t Sakuma', 'Ehime Prefecture\t Uwajima']
['Yamagata Prefecture\t sakata', 'Gifu Prefecture\t Mino', 'Gunma Prefecture\t Maebashi', 'Chiba\t Mobara', 'Saitama\t Hatoyama']
['Osaka\t Toyonaka', 'Yamanashi Prefecture\t otsuki', 'Yamagata Prefecture\t Tsuruoka', 'Aichi prefecture\t Nagoya']

Next is saving to a file. In this problem, the original file needs to be ** split into a file with the name of the second argument + the name of the serial number **. What should i do?

~ 10 minutes google ~


Reference URL
https://news.mynavi.jp/article/zeropython-40/

If you write something like this, you can save the serial number file.

kou.py


for i in range(5):
    print("test-{0:03d}.jpg ".format(i + 1))
test-001.jpg
test-002.jpg
test-003.jpg
test-004.jpg
test-005.jpg

It seems to use something called format (). ** I personally find it extremely difficult. ** Why can I add .format () directly to the characters enclosed in double quotation marks and pass the value ...? I think it's because of the way {} is written, but what is this notation?

~ 5 minutes google ~

Hmmm, it doesn't come chic. It seems that you can write it in another way without using format (), so let's use another way. ** If you use something you don't understand "because it moves", you will understand it even more later. ** A rule of thumb.


Reference URL
https://gammasoft.jp/blog/python-string-format/

Rewrite ↓

tes.py


for i in range(3):
    with open("test"+ str(i+1) +".txt", 'a') as f:
        print("Tesuto", file=f )
[ec2-user@ip-172-31-34-215 02]$ ll
total 124
-rw-rw-r-- 1 ec2-user ec2-user   3 Mar 27 17:15 test 1.txt
-rw-rw-r-- 1 ec2-user ec2-user   3 Mar 27 17:15 test 2.txt
-rw-rw-r-- 1 ec2-user ec2-user   3 Mar 27 17:15 test 3.txt

Alright, this is fine. All I had to do was use the concatenation operator. It may be awkward, but it's the easiest for me to understand.

Now create the answer to the task.

enshu16.py


import os.path
import sys

os.chdir((os.path.dirname(os.path.abspath(__file__))))

args = int(sys.argv[1])
linedata = []

with open('merge.txt', mode="r") as f:
    linedata = f.read().splitlines()

for i in range(0, len(linedata), args):
    with open("test"+ str(i+1) +".txt", 'a') as f:
        output = linedata[i:i+args:]
        for j in output:
            print(j, file =f)
[ec2-user@ip-172-31-34-215 02]$ python3 enshu16.py 2

[ec2-user@ip-172-31-34-215 02]$ ll
total 160
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 11.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 13.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 15.txt
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 27 17:30 test 17.txt
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 27 17:30 test 19.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 1.txt
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 27 17:30 test 21.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 23.txt
-rw-rw-r-- 1 ec2-user ec2-user  37 Mar 27 17:30 test 3.txt
-rw-rw-r-- 1 ec2-user ec2-user  43 Mar 27 17:30 test 5.txt
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 27 17:30 test 7.txt
-rw-rw-r-- 1 ec2-user ec2-user  34 Mar 27 17:30 test 9.txt
-rw-rw-r-- 1 ec2-user ec2-user 530 Mar 27 17:29 enshu16.py
-rw-rw-r-- 1 ec2-user ec2-user 435 Mar 19 17:12 merge.txt

[ec2-user@ip-172-31-34-215 02]$cat test 1.txt 
Kochi Prefecture Ekawasaki
Kumagaya, Saitama Prefecture

It's done! Since the serial number is not filled with 0, the order is unreasonable. ** I realized that format () would probably be useful when padding serial numbers with zeros. ** ** I won't use it this time, but I will use it soon.

It probably took about 3 hours so far! !! !! !! !! !! !! !! Chapter 2 is relatively easy, isn't it? Also, I'm getting used to posting qiita, and on the contrary, I'm doing it because I'm not motivated.

Recommended Posts

Python inexperienced person tries to knock 100 language processing 14-16
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing with Python Knock 2015
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
100 Language Processing Knock with Python (Chapter 3)
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
100 Language Processing Knock Chapter 1 by Python
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Amateur Language Processing Knock: 17
100 language processing knock 2020 [00 ~ 49 answer]
Python: Natural language processing
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
Introduction to Python language
100 Amateur Language Processing Knock: 07
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
Entry where Python beginners do their best to knock 100 language processing little by little
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
100 Language Processing Knock-51: Word Clipping
100 Language Processing Knock-58: Tuple Extraction
100 Language Processing Knock-57: Dependency Analysis
100 language processing knock-50: sentence break
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity
I tried 100 language processing knock 2020
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
100 Amateur Language Processing Knock: Summary
100 Language Processing Knock-43: Extracted clauses containing nouns related to clauses containing verbs
[Python] Try to classify ramen shops by natural language processing
100 language processing knock-42: Display of the phrase of the person concerned and the person concerned
Leave the troublesome processing to Python
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
100 language processing knock-76 (using scikit-learn): labeling