[PYTHON] 100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)

Introduction

"Knock 100 Language Processing published in Inui / Okazaki Laboratory of Tohoku University .tohoku.ac.jp/nlp100/) ”is famous as a collection of problems for students and working adults studying language processing.

So I'm also a Tohoku University student, so although my major is different, I tried Knock 100 language processing.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

words = "stressed"
print(words[::-1])

[result] desserts

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

words = "Patatoku Kashii"
print(words[::2])

[result] Police car

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

word1 = "Police car"
word2 = "taxi"
for i in range(4):
    print(word1[i]+word2[i],end="")

[result] Patatoku Kashii

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

PI = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
PI = PI.replace('.','')
PI = PI.replace(',','')
PI = PI.split()
ans = [len(num) for num in PI]
ans

[result] [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

element = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
dict = {}
list = [1,5,6,7,8,9,15,16,19]
for i,j in enumerate(element.split()):
    if i+1 in list:
        dict[i+1] = j[0]
    else:
        dict[i+1] = j[0:2]
print(dict)

[result] {1: 'H', 2: 'He', 3: 'Li', 4: 'Be', 5: 'B', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 10: 'Ne', 11: 'Na', 12: 'Mi', 13: 'Al', 14: 'Si', 15: 'P', 16: 'S', 17: 'Cl', 18: 'Ar', 19: 'K', 20: 'Ca'}

  1. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

def n_gram(target, n):
    result = []
    for i in range(len(target) - n + 1):
        result.append(target[i:i + n])
    return result

target = "I am an NLPer"
words_target = target.split()

print("[Word bi-gram]")
print(n_gram(words_target, 2))
print("[Character bi-gram]") 
print(n_gram(target, 2))

[result] [Word bi-gram] [['I', 'am'], ['am', 'an'], ['an', 'NLPer']] [Character bi-gram] ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

def n_gram(target, n):
    result = []
    for i in range(0,len(target) - n + 1):
        result.append(target[i:i + n])
    return result

text_x = "paraparaparadise"
text_y = "paragraph"

set_x = set(n_gram(text_x, 2))
set_y = set(n_gram(text_y, 2))

print("X:",end="")
print(set_x)
print("Y:",end="")
print(set_y)

print("----")

print("[Union]")
# print(set_x | set_y)But OK
print(set_x.union(set_y))

print("[Intersection]")
# print(set_x & set_y)But OK
print(set_x.intersection(set_y))

print("[Difference set]")
# print(set_x - set_y)But OK
print(set_x.difference(set_y))

print("----")

print("se in X: ",end="")
print('se' in set_x)
print("se in Y: ",end="")
print('se' in set_y)

[result] X:{'ar', 'ap', 'ad', 'di', 'ra', 'pa', 'se', 'is'} Y:{'ar', 'ap', 'ag', 'gr', 'ph', 'ra', 'pa'} ---- [Union] {'ar', 'ap', 'ag', 'ad', 'gr', 'ph', 'di', 'ra', 'pa', 'se', 'is'} [Intersection] {'pa', 'ar', 'ap', 'ra'} [Difference set] {'is', 'ad', 'di', 'se'} ---- se in X: True se in Y: False

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

def createSentence(x,y,z):
    print("{}of time".format(x) + y + "Is{}".format(z))

createSentence(12,"temperature",22.4)

[result] The temperature at 12:00 is 22.4

08. Ciphertext

Implement the function cipher that converts each character of the given character string with the following specifications.

  • Replace with (219 --character code) characters in lowercase letters
  • Other characters are output as they are Use this function to encrypt / decrypt English messages.
def cipher(s):
    result =''
    for i in s:
        if i.islower():
            result += chr(219-ord(i))
        else:
            result += i
    return result

text = "Man is but a reed, the most feeble thing in the nature, but he is a thinking reed. "
cipher(text)

[result] 'Mzm rh yfg z ivvw, gsv nlhg uvvyov gsrmt rm gsv mzgfiv, yfg sv rh z gsrmprmt ivvw. '

  1. Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

import random

def typoglycemia(target):
    res = []
    for s in target.split():
        if len(s) < 5:
            res.append(s)
        else:
            head = s[0]
            tail = s[-1]
            inner = list(s[1:-1])
            random.shuffle(inner)
            res.append(head+"".join(inner)+tail)
    return " ".join(res)

target = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(typoglycemia(target))

[result] I cud'lont bivleee that I colud aaclutly unnsetadrd what I was rindaeg : the phnomeneal pwoer of the human mind .

at the end

Please point out any mistakes. I will fix it.

Recommended Posts

100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)
[Language processing 100 knocks 2020] Chapter 1: Preparatory movement
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 natural language processing knocks Chapter 1 Preparatory movement (second half)
100 natural language processing knocks Chapter 1 Preparatory movement (first half)
NLP100: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 1
Python: Natural language processing
100 Language Processing Knock Chapter 1
100 language processing knocks ~ Chapter 1
100 language processing knocks Chapter 2 (10 ~ 19)
RNN_LSTM2 Natural language processing
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 natural language processing knocks Chapter 4 Morphological analysis (first half)
100 natural language processing knocks Chapter 3 Regular expressions (first half)
100 natural language processing knocks Chapter 4 Morphological analysis (second half)
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
Natural language processing 1 Morphological analysis
100 natural language processing knocks Chapter 3 Regular expressions (second half)
100 natural language processing knocks Chapter 6 English text processing (second half)
Natural language processing 3 Word continuity
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]
100 natural language processing knocks Chapter 6 English text processing (first half)
100 natural language processing knocks Chapter 5 Dependency analysis (second half)
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 05-09]
Natural language processing 2 Word similarity
100 natural language processing knocks Chapter 5 Dependency analysis (first half)
100 Natural Language Processing Knock Chapter 2 UNIX Command Basics (Second Half)
100 Natural Language Processing Knock Chapter 2 UNIX Command Basics (First Half)
Study natural language processing with Kikagaku
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
[Language processing 100 knocks 2020] Chapter 3: Regular expressions
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
Natural language processing for busy people
100 language processing knocks 2020: Chapter 4 (morphological analysis)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
[Language processing 100 knocks 2020] Chapter 5: Dependency analysis
[Natural language processing] Preprocessing with Japanese
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock with Python (Chapter 3)
Artificial language Lojban and natural language processing (artificial language processing)
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
[Language processing 100 knocks 2020] Chapter 7: Word vector
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 language processing knocks 2020: Chapter 3 (regular expression)
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
[Language processing 100 knocks 2020] Chapter 8: Neural network
[Language processing 100 knocks 2020] Chapter 2: UNIX commands
I tried 100 language processing knock 2020: Chapter 1
[Language processing 100 knocks 2020] Chapter 9: RNN, CNN
100 Language Processing Knock Chapter 1 by Python
Preparing to start natural language processing