Introduction

Now, after studying Python, I tried Language Processing 100 Knock 2015. At first, try it without looking at anything, and then try again with a better (smart) writing method, referring to the other person's writing method of 100 knocks.

Better writing references are listed at the end.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

`code`


input_str = 'stressed'
result = input_str[::-1]
print(result)

`Output result`


desserts

Better code

If you set a negative value for the slice step, it will be seen from the end. It is unlikely that it will change much more than this.

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

`code`


input_str = 'Patatoku Kashii'
result = ''
for index, s in enumerate(input_str):
   if index % 2 == 0:
       result += s
print(result)

`Output result`


Police car

Better code

This is also OK with slices.

`code`


input_str = 'Patatoku Kashii'
result = input_str[::2]
print(result)

02. "Police car" + "Taxi" = "Patatokukashi"

Get the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

`code`


p = 'Police car'
t = 'taxi'
result = ''
for i in range(len(p)):
   result += p[i]
   result += t[i]
print(result)

`Output result`


Patatoku Kashii

Better code

Make a list of ['Patter',' Toku',' Kashi',' ーー'] and join ().

`code`


p = 'Police car'
t = 'taxi'
result = ''.join([char1 + char2 for char1, char2 in zip(p, t)])
print(result)

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

`code`


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = []
input_str = input_str.replace(',', '').replace('.', '').split(' ')
for s in input_str:
   result.append(len(s))
print(result)

`Output result`


[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Better code

The argument of split () does not need to be specified because it is `''' by default.

`code`


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = []
input_str = input_str.replace(',', '').replace('.', '').split()
for s in input_str:
   result.append(len(s))
print(result)

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, 19 The first word is the first character, the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) is created. Create it.

`code`


input_str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
single_ary = [1, 5, 6, 7, 8, 9, 15, 16, 19]
result = {}
input_str = input_str.replace('.', '').split(' ')
for index, s in enumerate(input_str):
    if index + 1 in single_ary:
        result[s[0]] = index
    else:
        result[s[0] + s[1]] = index
print(result)

`Output result`


{'H': 0, 'He': 1, 'Li': 2, 'Be': 3, 'B': 4, 'C': 5, 'N': 6, 'O': 7, 'F': 8, 'Ne': 9, 'Na': 10, 'Mi': 11, 'Al': 12, 'Si': 13, 'P': 14, 'S': 15, 'Cl': 16, 'Ar': 17, 'K': 18, 'Ca': 19}

Better code

Turn an if statement into a ternary operator. It seems that the name ary in the list is not good, so I corrected it.

`code`


input_str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
single_list = [1, 5, 6, 7, 8, 9, 15, 16, 19]
result = {}
input_str = input_str.split()
for index, s in enumerate(input_str):
    l = 1 if index + 1 in single_list else 2
    result[s[:l]] = index
print(result)

n-gram Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

`code`


input_str = 'I am an NLPer'

def create_word_n_gram(input_str, num):
    str_list = input_str.split(' ')
    results = []
    for i in range(len(str_list) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_list[j + i]
        results.append(ngram)
    return results

def create_char_n_gram(input_str, num):
    str_no_space = input_str.replace(' ', '')
    results = []
    for i in range(len(str_no_space) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_no_space[j + i]
        results.append(ngram)
    return results

print(create_word_n_gram(input_str, 2))
print(create_char_n_gram(input_str, 2))

`Output result`


['Iam', 'aman', 'anNLPer']
['Ia', 'am', 'ma', 'an', 'nN', 'NL', 'LP', 'Pe', 'er']

Better code

The character n-gram also includes spaces. I was always misunderstanding. .. .. Corrected the word n-gram because the output format is not correct.

`code`


input_str = 'I am an NLPer'

def create_word_n_gram(input_str, num):
    str_list = input_str.split(' ')
    results = []
    for i in range(len(str_list) - num + 1):
        results.append(str_list[i:i + num])
    return results

def create_char_n_gram(input_str, num):
    results = []
    for i in range(len(input_str) - num + 1):
        results.append(input_str[i:i + num])
    return results

print(create_word_n_gram(input_str, 2))
print(create_char_n_gram(input_str, 2))

`Output result`


[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

06. Meeting

Find the set of character bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

`code`


input_str_x = 'paraparaparadise'
input_str_y = 'paragraph'
word = 'se'

def create_char_n_gram(input_str, num):
    str_no_space = input_str.replace(' ', '')
    results = []
    for i in range(len(str_no_space) - num + 1):
        ngram = ''
        for j in range(num):
            ngram += str_no_space[j + i]
        results.append(ngram)
    return results

def calculate_union(list_x, list_y):
    list_union = list(set(list_x + list_y))
    return list_union

def calculate_intersection(list_x, list_y):
    list_sum = list_x + list_y
    list_intersection = [elem for elem in set(list_sum) if list_sum.count(elem) > 1]
    return list_intersection

def calculate_difference(list_x, list_y):
    list_intersection = calculate_intersection(list_x, list_y)
    list_sum = list_x + list_intersection
    list_difference = [elem for elem in set(list_sum) if list_sum.count(elem) == 1]
    return list_difference

def check_including_word(word_list, word):
    if word in word_list:
        return True
    else:
        return False

x = create_char_n_gram(input_str_x, 2)
y = create_char_n_gram(input_str_y, 2)

print(calculate_union(x, y))
print(calculate_intersection(x, y))
print(calculate_difference(x, y))
print(check_including_word(x, word))
print(check_including_word(y, word))

`Output result`


['ar', 'ag', 'gr', 'is', 'ph', 'se', 'pa', 'di', 'ap', 'ad', 'ra']
['ar', 'pa', 'ap', 'ra']
['is', 'se', 'di', 'ad']
True
False

Better code

Set operation is possible if set. I didn't need a function ... The check function is also quite redundant. You should notice this while writing.

`code`


input_str_x = 'paraparaparadise'
input_str_y = 'paragraph'
word = 'se'

def create_char_n_gram(input_str, num):
    results = []
    for i in range(len(input_str) - num + 1):
        results.append(input_str[i:i + num])
    return results

x = set(create_char_n_gram(input_str_x, 2))
y = set(create_char_n_gram(input_str_y, 2))

print(x | y)
print(x - y)
print(x & y)
print(word in x)
print(word in y)

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

`code`


x = 12
y = 'temperature'
z = 22.4

def create_str(x, y, z):
    return str(x) + 'of time' + y + 'Is' + str(z)
    
print(create_str(x, y, z))

`Output result`


The temperature at 12:00 is 22.4

Better code

I think there are many ways to add strings, but it doesn't seem to change much more than this.

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications.

Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

`code`


input_str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'

def cipher(input_str):
    result = list(map(lambda e: chr(219 - ord(e)) if e.islower() else e, input_str))
    return ''.join(result)

print(cipher(input_str))
print(cipher(cipher(input_str)))

`Output result`


Nld I mvvw z wirmp, zoxlslorx lu xlfihv, zugvi gsv svzeb ovxgfivh rmeloermt jfzmgfn nvxszmrxh.
Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.

Better code

It is unlikely that it will change much more than this. Correctly, "decrypt" instead of "decrypt". I thought, but now it seems that "decryption" is also commonly used ...

For a moment, I thought it was impossible to perform both encryption and decryption with the same operation, but if 219 --x = y, it was natural that 219 --y = x. 219 may be another number.

Typoglycemia Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

`code`


import random
input_str = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."

def create_typoglycemia(input_str):
    input_str_list = input_str.split(' ')
    result = []
    for word in input_str_list:
        length = len(word)
        if length > 4:
            first_char = word[0]
            last_char = word[length - 1]
            random_str = ''.join(random.sample(list(word[1:length - 1]), length - 2))
            result.append(word[0] + random_str + word[length - 1])
        else:
            result.append(word)
    return ' '.join(result)

print(create_typoglycemia(input_str))

`Output result`


I cunldo't biveele that I culod aclatluy urdseanntd what I was rdineag : the pehaneomnl pewor of the huamn mind .

Better code

Delete variables that are used only once even though they are defined, and variables that are not used at all even though they are defined.

`code`


import random
input_str = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."

def create_typoglycemia(input_str):
    result = []
    for word in input_str.split(' '):
        length = len(word)
        if length > 4:
            random_str = ''.join(random.sample(word[1:length - 1], length - 2))
            result.append(word[0] + random_str + word[length - 1])
        else:
            result.append(word)
    return ' '.join(result)

print(create_typoglycemia(input_str))

in conclusion

I will summarize the article of 100 language processing knock 2015 (Python) posted on Qiita for reference.

https://qiita.com/gamma1129/items/37bf660cf4e4b21d4267
https://qiita.com/tanaka0325/items/08831b96b684d7ecb2f7
https://qiita.com/nubilum/items/af0a2ff057b9a6d708ad

After doing 100 language processing knock 2015, I got a lot of basic Python skills Chapter 1

Introduction

Chapter 1: Preparatory movement

00. Reverse order of strings

code

Output result

Better code

01. "Patatokukashi"

code

Output result

Better code

code

02. "Police car" + "Taxi" = "Patatokukashi"

code

Output result

Better code

code

03. Pi

code

Output result

Better code

code

04. Element symbol

code

Output result

Better code

code

code

Output result

Better code

code

Output result

06. Meeting

code

Output result

Better code

code

07. Sentence generation by template

code

Output result

Better code

08. Ciphertext

code

Output result

Better code

code

Output result

Better code

code

in conclusion

`code`

`Output result`

`code`

`Output result`

`code`

`code`

`Output result`

`code`

`code`

`Output result`

`code`

`code`

`Output result`

`code`

`code`

`Output result`

`code`

`Output result`

`code`

`Output result`

`code`

`code`

`Output result`

`code`

`Output result`

`code`

`Output result`

`code`