[PYTHON] [Language processing 100 knocks 2020] Chapter 1: Preparatory movement

Introduction

2020 version of 100 knocks of language processing, which is famous as a collection of problems of natural language processing, has been released. This article summarizes the results of solving Chapter 1: Preparatory Movement out of the following Chapters 1 to 10. ..

--Chapter 1: Preparatory movement -Chapter 2: UNIX Commands -Chapter 3: Regular Expressions -Chapter 4: Morphological analysis -Chapter 5: Dependency Analysis -Chapter 6: Machine Learning --Chapter 7: Word Vector --Chapter 8: Neural Net --Chapter 9: RNN, CNN --Chapter 10: Machine Translation

Advance preparation

We use Google Colaboratory for answers. For details on how to set up and use Google Colaboratory, see this article. The notebook containing the execution results of the following answers is available on github.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

str = 'stressed'
ans = str[::-1]

print(ans)

output


desserts

Extract strings with Python

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

str = 'Patatoku Kashii'
ans = str[::2]

print(ans)

output


Police car

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

str1 = 'Police car'
str2 = 'taxi'
ans = ''.join([i + j for i, j in zip(str1, str2)])

print(ans)

output


Patatoku Kashii

How to use Python and zip functions: Get multiple list elements at once How to use Python list comprehension Concatenate and combine strings with Python

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

import re

str = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
str = re.sub('[,\.]', '', str)  # ,When.Remove
splits = str.split()  #Create a word-by-word list separated by spaces
ans = [len(i) for i in splits]

print(ans)

output


[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Replace strings with Python Split strings in Python Get the size of objects of various types with Python's len function

04. Element symbol

Break down the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.” Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

str = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
splits = str.split()
one_ch = [1, 5, 6, 7, 8, 9, 15, 16, 19]  #Number list of words to extract one letter
ans = {}
for i, word in enumerate(splits):
  if i + 1 in one_ch:
    ans[word[:1]] = i + 1  #Get 1 character if in list
  else:
    ans[word[:2]] = i + 1  #If not, get 2 characters
    
print(ans)

output


{'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17, 'Ar': 18, 'K': 19, 'Ca': 20}

Loop processing by Python for statement How to use Python, enumerate: Get list elements and indexes How to write conditional branch by if statement in Python Dictionary creation in Python dict () and curly braces, dictionary comprehension

  1. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

def ngram(n, lst):
    return set(zip(*[lst[i:] for i in range(n)]))

str = 'I am an NLPer'
words_bi_gram = ngram(2, str.split())
chars_bi_gram = ngram(2, str)

print('Word bi-gram:', words_bi_gram)
print('Character bi-gram:', chars_bi_gram)

output


Word bi-gram: {('am', 'an'), ('I', 'am'), ('an', 'NLPer')}
Character bi-gram: {('I', ' '), (' ', 'N'), ('e', 'r'), ('a', 'm'), (' ', 'a'), ('n', ' '), ('L', 'P'), ('m', ' '), ('P', 'e'), ('N', 'L'), ('a', 'n')}

Define / call a function in Python Python, set operation with set type

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

Here, the function ngram created in 05 is reused.

str1 = 'paraparaparadise'
str2 = 'paragraph'
X = ngram(2, str1)
Y = ngram(2, str2)
union = X | Y
intersection = X & Y
difference = X - Y

print('X:', X)
print('Y:', Y)
print('Union:', union)
print('Intersection:', intersection)
print('Difference set:', difference)
print('Does X contain se:', {('s', 'e')} <= X)
print('Does Y contain se:', {('s', 'e')} <= Y)

output


X: {('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('r', 'a'), ('i', 's'), ('d', 'i'), ('a', 'd')}
Y: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('a', 'g')}
Union: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('i', 's'), ('a', 'g'), ('d', 'i'), ('a', 'd')}
Intersection: {('p', 'a'), ('r', 'a'), ('a', 'r'), ('a', 'p')}
Difference set: {('d', 'i'), ('i', 's'), ('a', 'd'), ('s', 'e')}
Does X contain se: True
Does Y contain se: False

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = ”temperature”, z = 22.4, and check the execution result.

def generate_sentence(x, y, z):
  print('{}At the time{}Is{}'.format(x, y, z))

generate_sentence(12, 'temperature', 22.4)

output


At 12 o'clock the temperature is 22.4

Format conversion with Python, format

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications. Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

def cipher(str):
  rep = [chr(219 - ord(x)) if x.islower() else x for x in str]
  
  return ''.join(rep)

message = 'the quick brown fox jumps over the lazy dog'
message = cipher(message)
print('encryption:', message)
message = cipher(message)
print('Decryption:', message)

output


encryption: gsv jfrxp yildm ulc qfnkh levi gsv ozab wlt
Decryption: the quick brown fox jumps over the lazy dog

Translate Unicode code points and characters with Python List of string methods for manipulating case in Python

  1. Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

import random

def shuffle(words):
  splits = words.split()
  if len(splits) > 4:
    splits = splits[:1] + random.sample(splits[1:-1], len(splits) - 2) + splits[-1:]

  return ' '.join(splits)

words = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind."
ans = shuffle(words)

print(ans)

output


I what could I reading human : phenomenal the couldn't of understand that believe was the power I actually mind.

Shuffle list elements in Python

in conclusion

100 Language Processing Knock is designed so that you can learn not only natural language processing itself, but also basic data processing and general-purpose machine learning. Even those who are studying machine learning in online courses will be able to practice very good output, so please try it.

Recommended Posts

[Language processing 100 knocks 2020] Chapter 1: Preparatory movement
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 natural language processing knocks Chapter 1 Preparatory movement (second half)
100 natural language processing knocks Chapter 1 Preparatory movement (first half)
100 language processing knocks Chapter 2 (10 ~ 19)
100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 32
100 language processing knocks (2020): 35
[Language processing 100 knocks 2020] Chapter 3: Regular expressions
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
100 natural language processing knocks Chapter 4 Commentary
[Language processing 100 knocks 2020] Chapter 6: Machine learning
100 language processing knocks (2020): 22
100 language processing knocks (2020): 26
100 language processing knocks (2020): 34
100 language processing knocks (2020): 42
100 language processing knocks (2020): 29
100 language processing knocks (2020): 49
100 language processing knocks 06 ~ 09
100 language processing knocks (2020): 43
100 language processing knocks (2020): 24
100 language processing knocks (2020): 45
100 language processing knocks (2020): 10-19
[Language processing 100 knocks 2020] Chapter 7: Word vector
100 language processing knocks (2020): 30
100 language processing knocks (2020): 00-09
100 language processing knocks 2020: Chapter 3 (regular expression)
100 language processing knocks (2020): 31
[Language processing 100 knocks 2020] Chapter 8: Neural network
100 language processing knocks (2020): 48
[Language processing 100 knocks 2020] Chapter 2: UNIX commands
100 language processing knocks (2020): 44
100 language processing knocks (2020): 41
100 language processing knocks (2020): 37
[Language processing 100 knocks 2020] Chapter 9: RNN, CNN
100 language processing knocks (2020): 25
100 language processing knocks (2020): 23
100 language processing knocks (2020): 33
100 language processing knocks (2020): 20
100 language processing knocks (2020): 27
[Language processing 100 knocks 2020] Chapter 4: Morphological analysis
100 language processing knocks (2020): 46
100 language processing knocks (2020): 21
100 language processing knocks (2020): 36
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 05-09]
100 amateur language processing knocks: 41
100 amateur language processing knocks: 71
100 amateur language processing knocks: 56
100 amateur language processing knocks: 24
100 amateur language processing knocks: 50
100 amateur language processing knocks: 59
100 amateur language processing knocks: 70
NLP100: Chapter 1 Preparatory Movement
100 amateur language processing knocks: 62
100 amateur language processing knocks: 60
100 Language Processing Knock 2020 Chapter 1
100 amateur language processing knocks: 92