[PYTHON] 100 Language Processing Knock Chapter 1

I don't know how much I can do. (cf. Knock 100 language processing 2015)

00. Reverse order of strings

# coding: utf-8

s = "stressed"

print(s[::-1])

01. "Patatokukashi"

# coding: utf-8

s = "Patatoku Kashii"

print(s[::2])

02. "Police car" + "Taxi" = "Patatokukashi"

# coding: utf-8

s1 = 'Police car'
s2 = 'taxi'

s = ''.join([i+j for i, j in zip(s1, s2)])
print(s)

You can create an iterator from multiple iterables with zip ().

Postscript

# coding: utf-8

s1 = 'Police car'
s2 = 'taxi'

s = ''.join(i+j for i, j in zip(s1, s2))
print(s)

It is not necessary to create an unnecessary list by using the generator comprehension notation.

03. Pi

# coding: utf-8
import re

s = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'

#Exclude commas and commas and then break down into a word-by-word list
s = re.sub('[,.]', '', s)
s = s.split()

#Count the number of characters and list
result = []
for w in s:
    result.append(len(w))
    
print(result)

Postscript

# coding: utf-8
s = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'

#After breaking it down into a word-by-word list,.Exclude and count the number of characters
result = [len(w.rstrip(',.')) for w in s.split()]

print(result)

Processing such as initializing with an empty list and turning for is rewritten with comprehension notation. rstrip () removes the specified character from the right. Characters can be specified together.

04. Element symbol

# coding: utf-8
s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'.split()

target = [1, 5, 6, 7, 8, 9, 15, 16, 19]

result ={}

for i in range(len(s)):
    if i + 1 in target:
        result[i+1] = s[i][:1]
    else:
        result[i+1] = s[i][:2]
        
print(result)

Postscript

# coding: utf-8
s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'

target = 1, 5, 6, 7, 8, 9, 15, 16, 19

result = [w[: 1 if i in target else 2] for i, w in enumerate(s.split(), 1)]

print(result)

At first I wondered if else could be used in the inclusion notation of the [: 1 if i in target else 2] part, but this is the slice [: x] and the ternary operator ʻa if cond else b. A combination of `.

  1. n-gram
# coding: utf-8

def n_gram(n, s):
    result = []
    for i in range(0, len(s)-n+1):
        result.append(s[i:i+n])
    return result

print(n_gram(2, 'I am an NLPer'))

Postscript

# coding: utf-8

def n_gram(n, s):
    return [s[i:i+n] for i in range(0, len(s)-n+1)]

print(n_gram(2, 'I am an NLPer'))

This is also for in the initialized list, so it can be rewritten in the inclusion notation. I'm not used to the comprehension, so I have the impression that it's difficult to write it in for and rewrite it.

06. Meeting

# coding: utf-8
#Union, intersection, and difference of two bigrams

def bi_gram(s):
    result = []
    for i in range(0, len(s)-1):
        result.append(s[i:i+2])
    return result
    
s1 = 'paraparaparadise'
s2 = 'paragraph'

X = set(bi_gram(s1))
Y = set(bi_gram(s2))
print("X = ", X)
print("Y = ", Y)
print("union: ", X | Y) # union
print("intersection: ", X & Y) # intersection
print("difference: ", X - Y) # difference

if "se" in X:
    print("X contain 'se'.")
else:
    print("X doesn't contain 'se'.")
    
if "se" in Y:
    print("Y contain 'se'.")
else:
    print("Y doesn't contain 'se'.")

07. Sentence generation by template

# coding: utf-8

def gen_sentence(x, y, z):
    return "{}of time{}Is{}".format(x, y, z)

x = 12
y = 'temperature'
z = 22.4
print(gen_sentence(x, y, z))

08. Ciphertext

# coding: utf-8

def cipher(S):
    result = []
    for i in range(len(S)):
        if(S[i].islower()):
            result.append(chr(219 - ord(S[i])))
        else:
            result.append(S[i])
    return ''.join(result)
    
S = "abcDe"
print(cipher(S))
print(cipher(cipher(S)))

Convert characters to Unicode code point integers with ʻord () . The inverse function is chr ()`.

Postscript

# coding: utf-8

def cipher(S):
    return ''.join(chr(219 - ord(c)) if c.islower() else c for c in S)

S = "abcDe"
print(cipher(S))
print(cipher(cipher(S)))

Those that conditionally branch to the initialized list and appendwithfor` can be replaced by ternary operators and comprehensions. If you know that it is such a thing, it seems that you can read it and write it.

  1. Typoglycemia
# coding: utf-8

import numpy.random as rd

def gen_typo(S):
    if len(S) <= 4:
        return S
    else:
        idx = [0]
        idx.extend(rd.choice(range(1, len(S)-1), len(S)-2, replace=False))
        idx.append(len(S)-1)
        result = []
        for i in range(len(S)):
            result.append(S[idx[i]])
        return ''.join(result)

s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
s = s.split()
print(' '.join([gen_typo(i) for i in s]))

There are various random sampling methods, but you can use numpy.random.choice to choose between restore extraction and non-restore extraction.

Postscript

# coding: utf-8

import random

def gen_typo(S):
    return ' '.join(
        s 
        #Returns words less than 4 in length
        if len(s) <= 4 
        #Shuffle words 5 or longer, leaving the first and second letters
        else s[0] + ''.join(random.sample(s[1:-1], len(s)-2)) + s[-1] 
        for s in S.split()
        )

S = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(gen_typo(S))

This is also rewritten with comprehensions and ternary operators.

Also, random.sample (population, k) randomly unrestores and extracts k elements from population (sequence or set).

Recommended Posts

100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock Chapter 1
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock Chapter 1 in Python
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
I tried 100 language processing knock 2020: Chapter 1
100 Language Processing Knock 2020 Chapter 1: Preparatory Movement
100 Language Processing Knock Chapter 1 by Python
100 Language Processing Knock 2020 Chapter 3: Regular Expressions
100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)
I tried 100 language processing knock 2020: Chapter 2
I tried 100 language processing knock 2020: Chapter 4
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Amateur Language Processing Knock: 17
100 language processing knock 2020 [00 ~ 49 answer]
100 Language Processing Knock-52: Stemming
100 language processing knocks ~ Chapter 1
100 Amateur Language Processing Knock: 07
100 language processing knocks Chapter 2 (10 ~ 19)
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
[Programmer newcomer "100 language processing knock 2020"] Solve Chapter 1
100 Language Processing with Python Knock 2015
100 Language Processing Knock-51: Word Clipping
100 Language Processing Knock-58: Tuple Extraction
100 Language Processing Knock-57: Dependency Analysis
100 Language Processing Knock UNIX Commands Learned in Chapter 2
100 Language Processing Knock Regular Expressions Learned in Chapter 3
100 language processing knock-50: sentence break
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity
I tried 100 language processing knock 2020
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")