100 Language Processing Knock Chapter 1 (Python)

100 language processing knocks http://www.cl.ecei.tohoku.ac.jp/nlp100/ From Chapter 1 00 to 09

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

python


print('stressed'[::-1])

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

python


print('Patatoku Kashii'[::2])

02. "Police car" + "Taxi" = "Patatokukashi"

Get the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

python


print(''.join(x+y for x, y in zip('Police car', 'taxi')))

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

python


import re

s = 'Now I need a drink, alcoholic of course, after the heavy \
lectures involving quantum mechanics.'

s = re.sub(r'[^A-Za-z\ ]+', '', s)
print([len(x) for x in s.split()])

What you commented on


s = 'Now I need a drink, alcoholic of course, after the heavy \
lectures involving quantum mechanics.'

print([len(w.rstrip(',.')) for w in s.split()])

What you commented on


s = 'Now I need a drink, alcoholic of course, after the heavy \
lectures involving quantum mechanics.'

print([sum(c.isalpha() for c in w) for w in s.split()])

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, 19 The first word is the first character, the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) is created. Create it.

python


import re

s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might \
Also Sign Peace Security Clause. Arthur King Can.'

s = re.sub(r'[^A-Za-z\ ]+', '', s)
print(
    {x[:1] if i in [1, 5, 6, 7, 8, 9, 15, 16, 19] else x[:2]: i+1 \
        for i, x in enumerate(s.split(' '), 1)}
)

What you commented on


s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might \
Also Sign Peace Security Clause. Arthur King Can.'

print({w[:2-(i in (1,5,6,7,8,9,15,16,19))]:i for i,w in enumerate(s.split(),1)})
  1. n-gram Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

python


def n(s):
    return [s[i:i+2] for i in range(len(s) if len(s) % 2 == 0 else len(s)-1)]

s = 'I am an NLPer'

print(n(s))
print(n(s.split(' ')))

06. Meeting

Find the set of character bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

python


def n(s):
    return [s[i:i+2] for i in range(len(s) if len(s) % 2 == 0 else len(s)-1)]

x = set(n('paraparaparaise'))
y = set(n('paragraph'))

print(x.union(y))
print(x.intersection(y))
print(x.difference(y))

print("se" in x)
print("se" in y)

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

python


def f(x, y, z):
    return '%s time%s is%s' % (x, y, z)

print(f(12, 'temperature', 22.4))

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications.

Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

python


def cipher(s):
    r = ''
    for x in s:
        if 97 <= ord(x) <= 122:
            r += chr(219 - ord(x))
        else:
            r += x
    return r

s = "I couldn't believe that I could actually understand what I was reading : \
the phenomenal power of the human mind ."

print(cipher(s))
print(cipher(cipher(s)))

What you commented on


def cipher(s):
    return ''.join(c.islower() and chr(219-ord(c)) or c for c in s)

s = "I couldn't believe that I could actually understand what I was reading : \
the phenomenal power of the human mind ."

print(cipher(s))
print(cipher(cipher(s)))
  1. Typoglycemia Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

python


import random

s = "I couldn't believe that I could actually understand what I was \
reading : the phenomenal power of the human mind ."

s = s.split(' ')
for i, x in enumerate(s):
    if len(x) > 4:
        r = x[1:-1]
        s[i] = x[0] + ''.join(random.sample(r, len(r))) + x[-1]

print(' '.join(s))

Recommended Posts

100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock Chapter 1 by Python
100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 Language Processing with Python Knock 2015
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
I tried 100 language processing knock 2020: Chapter 1
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 Language Processing Knock 2020 Chapter 3: Regular Expressions
100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 Amateur Language Processing Knock: 17
100 language processing knock 2020 [00 ~ 49 answer]
Python: Natural language processing
100 Language Processing Knock-52: Stemming
100 language processing knocks ~ Chapter 1
100 Amateur Language Processing Knock: 07
100 language processing knocks Chapter 2 (10 ~ 19)
100 Amateur Language Processing Knock: 09
[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
Python inexperienced person tries to knock 100 language processing 14-16
100 Language Processing Knock UNIX Commands Learned in Chapter 2
100 Language Processing Knock Regular Expressions Learned in Chapter 3
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing Knock-51: Word Clipping