After doing 100 language processing knock 2015, I got a lot of basic Python skills Chapter 1

Introduction

Now, after studying Python, I tried Language Processing 100 Knock 2015. At first, try it without looking at anything, and then try again with a better (smart) writing method, referring to the other person's writing method of 100 knocks.

Better writing references are listed at the end.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

code


input_str = 'stressed'
result = input_str[::-1]
print(result)

Output result


desserts

Better code

If you set a negative value for the slice step, it will be seen from the end. It is unlikely that it will change much more than this.

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

code


input_str = 'Patatoku Kashii'
result = ''
for index, s in enumerate(input_str):
   if index % 2 == 0:
       result += s
print(result)

Output result


Police car

Better code

This is also OK with slices.

code


input_str = 'Patatoku Kashii'
result = input_str[::2]
print(result)

02. "Police car" + "Taxi" = "Patatokukashi"

Get the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

code


p = 'Police car'
t = 'taxi'
result = ''
for i in range(len(p)):
   result += p[i]
   result += t[i]
print(result)

Output result


Patatoku Kashii

Better code

Make a list of ['Patter',' Toku',' Kashi',' ー ー'] and join ().

code


p = 'Police car'
t = 'taxi'
result = ''.join([char1 + char2 for char1, char2 in zip(p, t)])
print(result)

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

code


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = []
input_str = input_str.replace(',', '').replace('.', '').split(' ')
for s in input_str:
   result.append(len(s))
print(result)

Output result


[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Better code

The argument of split () does not need to be specified because it is `''' by default.

code


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = []
input_str = input_str.replace(',', '').replace('.', '').split()
for s in input_str:
   result.append(len(s))
print(result)

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, 19 The first word is the first character, the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) is created. Create it.

code


input_str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
single_ary = [1, 5, 6, 7, 8, 9, 15, 16, 19]
result = {}
input_str = input_str.replace('.', '').split(' ')
for index, s in enumerate(input_str):
    if index + 1 in single_ary:
        result[s[0]] = index
    else:
        result[s[0] + s[1]] = index
print(result)

Output result


{'H': 0, 'He': 1, 'Li': 2, 'Be': 3, 'B': 4, 'C': 5, 'N': 6, 'O': 7, 'F': 8, 'Ne': 9, 'Na': 10, 'Mi': 11, 'Al': 12, 'Si': 13, 'P': 14, 'S': 15, 'Cl': 16, 'Ar': 17, 'K': 18, 'Ca': 19}

Better code

Turn an if statement into a ternary operator. It seems that the name ary in the list is not good, so I corrected it.

code


input_str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
single_list = [1, 5, 6, 7, 8, 9, 15, 16, 19]
result = {}
input_str = input_str.split()
for index, s in enumerate(input_str):
    l = 1 if index + 1 in single_list else 2
    result[s[:l]] = index
print(result)
  1. n-gram Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

code


input_str = 'I am an NLPer'

def create_word_n_gram(input_str, num):
    str_list = input_str.split(' ')
    results = []
    for i in range(len(str_list) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_list[j + i]
        results.append(ngram)
    return results

def create_char_n_gram(input_str, num):
    str_no_space = input_str.replace(' ', '')
    results = []
    for i in range(len(str_no_space) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_no_space[j + i]
        results.append(ngram)
    return results

print(create_word_n_gram(input_str, 2))
print(create_char_n_gram(input_str, 2))

Output result


['Iam', 'aman', 'anNLPer']
['Ia', 'am', 'ma', 'an', 'nN', 'NL', 'LP', 'Pe', 'er']

Better code

The character n-gram also includes spaces. I was always misunderstanding. .. .. Corrected the word n-gram because the output format is not correct.

code


input_str = 'I am an NLPer'

def create_word_n_gram(input_str, num):
    str_list = input_str.split(' ')
    results = []
    for i in range(len(str_list) - num + 1):
        results.append(str_list[i:i + num])
    return results

def create_char_n_gram(input_str, num):
    results = []
    for i in range(len(input_str) - num + 1):
        results.append(input_str[i:i + num])
    return results

print(create_word_n_gram(input_str, 2))
print(create_char_n_gram(input_str, 2))

Output result


[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

06. Meeting

Find the set of character bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

code


input_str_x = 'paraparaparadise'
input_str_y = 'paragraph'
word = 'se'

def create_char_n_gram(input_str, num):
    str_no_space = input_str.replace(' ', '')
    results = []
    for i in range(len(str_no_space) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_no_space[j + i]
        results.append(ngram)
    return results

def calculate_union(list_x, list_y):
    list_union = list(set(list_x + list_y))
    return list_union

def calculate_intersection(list_x, list_y):
    list_sum = list_x + list_y
    list_intersection = [elem for elem in set(list_sum) if list_sum.count(elem) > 1]
    return list_intersection

def calculate_difference(list_x, list_y):
    list_intersection = calculate_intersection(list_x, list_y)
    list_sum = list_x + list_intersection
    list_difference = [elem for elem in set(list_sum) if list_sum.count(elem) == 1]
    return list_difference

def check_including_word(word_list, word):
    if word in word_list:
        return True
    else:
        return False

x = create_char_n_gram(input_str_x, 2)
y = create_char_n_gram(input_str_y, 2)

print(calculate_union(x, y))
print(calculate_intersection(x, y))
print(calculate_difference(x, y))
print(check_including_word(x, word))
print(check_including_word(y, word))

Output result


['ar', 'ag', 'gr', 'is', 'ph', 'se', 'pa', 'di', 'ap', 'ad', 'ra']
['ar', 'pa', 'ap', 'ra']
['is', 'se', 'di', 'ad']
True
False

Better code

Set operation is possible if set. I didn't need a function ... The check function is also quite redundant. You should notice this while writing.

code


input_str_x = 'paraparaparadise'
input_str_y = 'paragraph'
word = 'se'

def create_char_n_gram(input_str, num):
    results = []
    for i in range(len(input_str) - num + 1):
        results.append(input_str[i:i + num])
    return results

x = set(create_char_n_gram(input_str_x, 2))
y = set(create_char_n_gram(input_str_y, 2))

print(x | y)
print(x - y)
print(x & y)
print(word in x)
print(word in y)

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

code


x = 12
y = 'temperature'
z = 22.4

def create_str(x, y, z):
    return str(x) + 'of time' + y + 'Is' + str(z)
    
print(create_str(x, y, z))

Output result


The temperature at 12:00 is 22.4

Better code

I think there are many ways to add strings, but it doesn't seem to change much more than this.

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications.

Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

code


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'

def cipher(input_str):
    result = list(map(lambda e: chr(219 - ord(e)) if e.islower() else e, input_str))
    return ''.join(result)

print(cipher(input_str))
print(cipher(cipher(input_str)))

Output result


Nld I mvvw z wirmp, zoxlslorx lu xlfihv, zugvi gsv svzeb ovxgfivh rmeloermt jfzmgfn nvxszmrxh.
Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.

Better code

It is unlikely that it will change much more than this. Correctly, "decrypt" instead of "decrypt". I thought, but now it seems that "decryption" is also commonly used ...

  1. Typoglycemia Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

code


import random
input_str = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."

def create_typoglycemia(input_str):
    input_str_list = input_str.split(' ')
    result = []
    for word in input_str_list:
        length = len(word)
        if length > 4:
            first_char = word[0]
            last_char = word[length - 1]
            random_str = ''.join(random.sample(list(word[1:length - 1]), length - 2))
            result.append(word[0] + random_str + word[length - 1])
        else:
            result.append(word)
    return ' '.join(result)

print(create_typoglycemia(input_str))

Output result


I cunldo't biveele that I culod aclatluy urdseanntd what I was rdineag : the pehaneomnl pewor of the huamn mind .

Better code

Delete variables that are used only once even though they are defined, and variables that are not used at all even though they are defined.

code


import random
input_str = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."

def create_typoglycemia(input_str):
    result = []
    for word in input_str.split(' '):
        length = len(word)
        if length > 4:
            random_str = ''.join(random.sample(word[1:length - 1], length - 2))
            result.append(word[0] + random_str + word[length - 1])
        else:
            result.append(word)
    return ' '.join(result)

print(create_typoglycemia(input_str))

in conclusion

I will summarize the article of 100 language processing knock 2015 (Python) posted on Qiita for reference.

Recommended Posts

After doing 100 language processing knock 2015, I got a lot of basic Python skills Chapter 1
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 1)
100 Language Processing Knock Chapter 2 (Python)
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 2 second half)
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 2 first half)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock with Python (Chapter 3)
I tried 100 language processing knock 2020: Chapter 1
100 Language Processing Knock Chapter 1 by Python
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 2
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
100 Language Processing with Python Knock 2015
I tried 100 language processing knock 2020
I want to start a lot of processes from python
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 10 to 14]
I made a lot of files for RDP connection with Python
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock-59: Analysis of S-expressions
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
I will write a detailed explanation to death while solving 100 natural language processing knock 2020 with Python
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 15 to 19]
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 Language Processing Knock 2020 Chapter 3: Regular Expressions
100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)
I did a lot of research on how Python is executed
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
I made a kind of simple image processing tool in Go language.
A reminder of what I got stuck when starting Atcoder with python
Python: I want to measure the processing time of a function neatly
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 Language Processing Knock-91: Preparation of Analogy Data
100 Language Processing Knock-44: Visualization of Dependent Tree
100 Language Processing Knock-26: Removal of emphasized markup
Connect a lot of Python or and and
I tried a functional language with Python
[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1
The story of blackjack A processing (python)
After studying Python3, I made a Slackbot
100 Language Processing Knock-34 (using pandas): "A B"
I tried to solve the 2020 version of 100 language processing knocks [Chapter 3: Regular expressions 20 to 24]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]