[Python] Challenge 100 knocks! (015 ~ 019)

About the history so far

Please refer to First Post

Knock status

9/24 added

Chapter 2: UNIX Command Basics

hightemp.txt is a file that stores the record of the highest temperature in Japan in the tab-delimited format of "prefecture", "point", "℃", and "day". Create a program that performs the following processing and execute hightemp.txt as an input file. Furthermore, execute the same process with UNIX commands and check the execution result of the program.

015. Output the last N lines

Receive the natural number N by means such as command line arguments, and display only the last N lines of the input. Use the tail command for confirmation.

tail_015.py


#-*- coding:utf-8 -*-

import codecs
import subprocess

def tail(data,N):
    max = len(data)
    print(''.join(data[max-N:]))

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    N=3
    tail(f.readlines(),N)

#Confirm with tail command
    output=subprocess.check_output(["tail","-n",str(N),basepath+filename])
    print(output.decode('utf-8'))

result


Yamanashi Prefecture Otsuki 39.9	1990-07-19
39 Tsuruoka, Yamagata Prefecture.9	1978-08-03
Aichi Prefecture Nagoya 39.9	1942-08-02

Yamanashi Prefecture Otsuki 39.9	1990-07-19
39 Tsuruoka, Yamagata Prefecture.9	1978-08-03
Aichi Prefecture Nagoya 39.9	1942-08-02

Impression: The point of ingenuity is how to specify the line to start join.

016. Divide the file into N

Receive the natural number N by means such as command line arguments, and divide the input file into N line by line. Achieve the same processing with the split command

split_016.py


-*- coding:utf-8 -*-

import codecs
import subprocess
import math

def split(data,N):
    index=0
#Calculate the number of files to export
    page=math.ceil(len(data)/N)
    for i in range(0,page):
#Write the data to write the list as a character string_Added to data
        write_data=''.join(data[index:N+index])
        index+=N
        f=codecs.open('write_data'+str(index),'w','utf-8')
        f.write(write_data)

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    N = 15
    f=codecs.open(filename,'r','utf-8')
    split(f.readlines(),N)
    output=subprocess.check_output(["split","-l",str(N),basepath+filename])

result


write by split function_data15 and write_The file of data30 was output
write_data15
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
(Omitted because the result is long)
write_data30
40 Sakata, Yamagata Prefecture.1	1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)

The xaa and xab files were output by the split command.
xaa
Kochi Prefecture Ekawasaki 41 2013-08-12
40 Kumagaya, Saitama Prefecture.9	2007-08-16
40 Tajimi, Gifu Prefecture.9	2007-08-16
(Omitted because the result is long)
xab
40 Sakata, Yamagata Prefecture.1	1978-08-03
Gifu Prefecture Mino 40 2007-08-16
Gunma Prefecture Maebashi 40 2001-07-24
(Omitted because the result is long)

Process finished with exit code 0

Impressions: When creating the split function, I was worried about how to calculate the number of pages and how to name the file when creating one file with N few lines. .. ..

017. Difference in the character string in the first column

Find the type of character string in the first column (set of different character strings). Use the sort and uniq commands for confirmation.

sort_uniq_017.py


# -*- coding:utf-8 -*-

import codecs
import subprocess

def sort_uniq(data):
    cut_temp = []
    sort_temp = []
    uniq_temp = []

#cut -How f 1 works
    for temp in data:
        cut_temp.append(temp.split()[:1])

#How sort works
    sort_temp = sorted(cut_temp)

#How uniq works
    for temp in sort_temp:
        if temp not in uniq_temp:
            uniq_temp.append(temp)

#After converting list to str, delete extra characters and display
    sort_uniq_data = map(str,uniq_temp)
    for temp in sort_uniq_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    sort_uniq(f.readlines())
    print('\n')

    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq"],stdin=sort.stdout,stdout=subprocess.PIPE)
    end_of_pipe = uniq.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').rstrip('\n'))

result


Chiba
Wakayama Prefecture
Saitama
Osaka
(Omitted because the result is long)


Chiba
Wakayama Prefecture
Saitama
Osaka
Yamagata Prefecture
(Omitted because the result is long)
Process finished with exit code 0

Impressions: I didn't know how to write a pipe using the subprocess module. .. .. With linux, you only need |, but if you program it, you can see what parameters are needed.

018. Sort

Sort each row in descending order of the numbers in the 3rd column Arrange each row in the reverse order of the numbers in the 3rd column (Note: Sort the contents of each row unchanged). Use the sort command for confirmation (this problem does not have to match the result of executing the command).

r_sort_018.py


#-*- conding:utf-8 -*-
import codecs
import subprocess
import operator

def r_sort(data):
    cut_temp = []
    sort_temp = []

#List
    for temp in data:
        cut_temp.append(temp.split())

#How sort works
    sort_temp = sorted(cut_temp,key=operator.itemgetter(2),reverse=True)

#After converting list to str, delete extra characters and display
    sort_data = map(str, sort_temp)
    for temp in sort_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        r_sort(f.readlines())
    print('\n')

    sort= subprocess.check_output(["sort","-r","-k","3",basepath+filename])
    print(sort.decode('utf-8'))

result


Kochi Prefecture', 'Ekawasaki', '41', '2013-08-12
Saitama', 'Kumagaya', '40.9', '2007-08-16
Gifu Prefecture', 'Tajimi', '40.9', '2007-08-16
(Omitted because the result is long)

Kochi Prefecture Ekawasaki 41 2013-08-12
40 Tajimi, Gifu Prefecture.9	2007-08-16
40 Kumagaya, Saitama Prefecture.9	2007-08-16
(Omitted because the result is long)
Process finished with exit code 0

Impressions: The itemgetter function of the operator module was useful.

019. Find the frequency of appearance of the character string in the first column of each line, and arrange them in descending order of frequency of appearance.

Find the frequency of occurrence of the character string in the first column of each line, and display them in descending order. Use the cut, uniq, and sort commands for confirmation.

frequency_019.py


#-*- coding:utf-8 -*-
import codecs
import subprocess
import collections
import operator

def frequency(data):
    cut_temp = []
    sort_temp = []
    count_dict={}

    # cut -How f 1 works
    for temp in data:
        cut_temp.append(temp.split()[:1])

    #How sort works
    sort_temp = sorted(cut_temp)

    #Count the number of elements in list
    # uniq -c+How sort works
    count_dict = collections.Counter(map(str,sort_temp))
    for value,count in sorted(count_dict.items(),key=operator.itemgetter(1),reverse=True):
        print(count,str(value).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        frequency(f.readlines())

    print('\n')
    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort1 = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq","-c"],stdin=sort1.stdout,stdout=subprocess.PIPE)
    sort2 = subprocess.Popen(["sort","-r"],stdin=uniq.stdout,stdout=subprocess.PIPE)
    end_of_pipe = sort2.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').lstrip(' ').rstrip('\n'))

result


3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Gunma Prefecture
3 Saitama Prefecture
2 Gifu Prefecture
2 Chiba
(Omitted because the result is long)

3 Gunma Prefecture
3 Yamanashi Prefecture
3 Yamagata Prefecture
3 Saitama Prefecture
2 Shizuoka Prefecture
2 Aichi prefecture
(Omitted because the result is long)
Process finished with exit code 0

Impressions: It was difficult to handle and sort dictionaries.

Recommended Posts

[Python] Challenge 100 knocks! (015 ~ 019)
[Python] Challenge 100 knocks! (030-034)
[Python] Challenge 100 knocks! (006-009)
[Python] Challenge 100 knocks! (000-005)
[Python] Challenge 100 knocks! (010-014)
[Python] Challenge 100 knocks! (025-029)
[Python] Challenge 100 knocks! (020-024)
python challenge diary ①
Challenge 100 data science knocks
Python
Sparta Camp Python 2019 Day2 Challenge
100 Pandas knocks for Python beginners
Challenge Python3 and Selenium Webdriver
Challenge LOTO 6 with Python without discipline
Image processing with Python 100 knocks # 3 Binarization
# 2 Python beginners challenge AtCoder! ABC085C --Otoshidama
Image processing with Python 100 knocks # 2 Grayscale
Python basics ⑤
python + lottery 6
Python Summary
Built-in python
Python comprehension
Python technique
Studying python
Python 2.7 Countdown
Python memorandum
Python FlowFishMaster
Python service
python tips
python function ①
Python basics
Python memo
ufo-> python (3)
Python comprehension
install python
Python Singleton
Python basics ④
Python Memorandum 2
python memo
Python Jinja2
Image processing with Python 100 knocks # 8 Max pooling
Python increment
atCoder 173 Python
[Python] function
Python installation
python tips
Installing Python 3.4.3.
Try python
Python memo
Python iterative
Python algorithm
Python2 + word2vec
[Python] Variables
Python functions
Python sys.intern ()
Python tutorial
Python decimals
python underscore
Python summary
Start python
[Python] Sort