Introduction

While wandering around the net, I suddenly came across a site called "Language Processing 100 Knock 2020". While I wanted to touch natural language processing, programming was a new programmer who did a little competition pro. I'm a little interested, so I'll try it. At the time of writing this article, only half of the total is finished, but I will write it in a memorial sense. I will stop if my heart breaks. Please guess if there is no previous article.

Environment and stance

environment

OS : macOS Catalina 10.15.3
Python : 3.7.6

stance

Implementation does not work very hard
I don't know the custom.
I don't think about safety that much.
Do your best so that others can read it.
I want to write as gently as possible for Python beginners (desire).

I will try to write a commentary as much as possible, but if you are interested, I recommend you to check it.

Solve "Chapter 1: Preparatory Movement"

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

`00.py`


print("stressed"[::-1])

`Terminal`


desserts

This is a process that makes use of Python slices. I often see slices when I'm a professional player. Slices can specify [start: stop: step].

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

`01.py`


print("Patatoku Kashii"[::2])

`Terminal`


Police car

Extracting every other character from the first character is easy with slices.

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

`02.py`


print("".join([ i + j for i, j in zip("Police car", "taxi")]))

`Terminal`


Patatoku Kashii

I'm shortening the code length using join () which converts the list to a string, list comprehension, and zip () which gets the contents of multiple lists (why).

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

Ver that shortened meaninglessly

`03.py`


print(*(map(lambda x: len(x),"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.".translate(str.maketrans({",":"",".":""})).split())))

`Terminal`


3 1 4 1 5 9 2 6 5 3 5 8 9 7 9

I can still go in one line ... (No sense of direction of effort) The map function executes a function for each element of list and returns a map object. That function is now defined by a lambda expression. The contents of the expression are defined to return the length of the given string. translate () replaces the string based on the conversion table created by str.maketrans (). Also, split () is listed by separating it with a space.

Maybe a decent ver

`03.py`


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
l = s.translate(str.maketrans({",": "", ".": ""})).split()
a = []
for i in l:
    a.append(len(i))
print(*a)

What you are doing is the same as the shorter one. The only thing that has changed is that what was done with the map function is made into a for statement. ʻAppend ()` adds an element to the end of the list.

The reason for adding * when printing is to expand and display the list.

04. Element symbol

Break down the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.” Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

`04.py`


s="Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.".split()
l=[1,5,6,7,8,9,15,16,19]
dic={}
for i in range(len(s)):
    if i in l:
        dic[s[i][0]]=i+1
    else:
        dic[s[i][:2]]=i+1
print(dic)

`Terminal`


{'Hi': 1, 'H': 2, 'Li': 3, 'Be': 4, 'Bo': 5, 'C': 20, 'N': 10, 'O': 8, 'F': 9, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'Pe': 15, 'S': 16, 'Ar': 18, 'Ki': 19}

I made it into multiple lines with the idea. If ʻi is in l`, the first character is generated, otherwise the second character is the key.

n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

Please refer to here for what n-gram is.

`05.py`


def N_gram(s, n=1):
    return [s[i:i+n] for i in range(len(s)-n+1)]


s = "I am an NLPer"
print(*(N_gram(s, 2)))
print(*(N_gram(s.split(), 2)))

`Terminal`


I   a am m   a an n   N NL LP Pe er
['I', 'am'] ['am', 'an'] ['an', 'NLPer']

The implementation of N_gram has become much more compact. It seems like this because there is a space in the execution result ... The range () function is a generator that returns integers from 0 to less than the specified number in order. You can also specify the 0 part. N = 1 in the function declaration part is a template argument. If not specified, n means 1.

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

For the time being, parapara paradise seems to be a dance game.

`06.py`


def N_gram(s, n=1):
    return {s[i:i + n] for i in range(len(s) - n + 1)}


s1 = "paraparaparadise"
s2 = "paragraph"

X = N_gram(s1, 2)
Y = N_gram(s2, 2)

s_union = X | Y
s_intersection = X & Y
s_difference = X - Y

print(*s_union)
print(*s_intersection)
print(*s_difference)

if "se" in X:
    print("\"se\" is in X")

if "se" not in Y:
    print("\"se\" is not in Y")

`Terminal`


pa ar ad ap is se di ag ph gr ra
ar pa ra ap
is ad se di
"se" is in X
"se" is not in Y

It's like writing as if you can see the future. You can use set.union,set.intersection (), andset.difference (), but personally it's easier to use|,&, -. So I did this.

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = ”temperature”, z = 22.4, and check the execution result.

`07.py`


def temp(x=12, y="temperature", z=22.4):
    return str(x) + "of time" + str(y) + "Is" + str(z)


print(temp())

`Terminal`


The temperature at 12:00 is 22.4

05 . I'm using the template arguments mentioned in n-gram. If you write an assignment statement in the argument written at the time of function declaration, the function will be executed with that value even if no argument is given at execution time.

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications. ・ Replace with (219 --character code) characters in lowercase letters ・ Other characters are output as they are Use this function to encrypt / decrypt English messages.

`08.py`


def cipher(s):
    return "".join(c.islower()*chr(219-ord(c))+(not c.islower())*c for c in s)


print(cipher("The quick brown fox jumps over the lazy dog."))
print(cipher(cipher("The quick brown fox jumps over the lazy dog.")))

`Terminal`


Tsv jfrxp yildm ulc qfnkh levi gsv ozab wlt.
The quick brown fox jumps over the lazy dog.

I did my best to make it a party. (No, not) This time, I make good use of the fact that Python's bool type is a subclass of int type. islower () is a function that determines whether it is lowercase. 219 --The character code is because it returns well after two times.

Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

Typoglycemia is a phenomenon in which some words in a sentence can be read correctly even if the order other than the first and last letters is changed (however, urban legend / Net meme).

`09.py`


import random


def typoglycemia(s):
    return s if len(s) < 4 else s[0] + "".join(random.sample([i for i in s[1: -1]], len(s)-2)) + s[-1]


s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind .".split()
print(" ".join(map(lambda x: typoglycemia(x), s)))

I also put the function on one line (why). If it is less than 4 characters, it will be left as it is, otherwise it will be shuffled except for the last character of the first character, concatenated and returned. Unlike random.shuffle (), random.sample () is characterized by the fact that the first argument can be immutable (non-modifiable). Also, random.shuffle () has no return value, but random.sample () returns a list.

in conclusion

I solved the problems in Chapter 1, but how was it? I think there were a lot of weird implementations, but that's playful. Please forgive me for now as I have to implement it properly even if I don't like it in the second half. From the next chapter, I hope to increase the amount of commentary.

Please comment if you like "this will shorten the code" or "this is better".

See you in the article in Chapter 2.

[PYTHON] [Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1

Introduction

Environment and stance

environment

stance

Solve "Chapter 1: Preparatory Movement"

00. Reverse order of strings

00.py

Terminal

01. "Patatokukashi"

01.py

Terminal

02. "Police car" + "Taxi" = "Patatokukashi"

02.py

Terminal

03. Pi

03.py

Terminal

03.py

04. Element symbol

04.py

Terminal

05.py

Terminal

06. Meeting

06.py

Terminal

07. Sentence generation by template

07.py

Terminal

08. Ciphertext

08.py

Terminal

09.py

in conclusion

`00.py`

`Terminal`

`01.py`

`Terminal`

`02.py`

`Terminal`

`03.py`

`Terminal`

`03.py`

`04.py`

`Terminal`

`05.py`

`Terminal`

`06.py`

`Terminal`

`07.py`

`Terminal`

`08.py`

`Terminal`

`09.py`