[PYTHON] 100 natural language processing knocks Chapter 1 Preparatory movement (second half)

A record of solving the problems in the second half of Chapter 1.

05. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

def ngram(data, n):
    res = []
    for i in xrange(len(data) - 1):
        res.append(data[i:i + n])
    return res

string = 'I am an NLPer'
print u'Character list bi-gram:'
print ngram(string.split(), 2)
print u'String bi-gram:'
print ngram(string, 2)
#=>Character list bi-gram:
#=> [['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
#=>String bi-gram:
#=> ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

For the bi-gram of the character string, spaces are also treated as one character.

</ i> 06. Assembly

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

def ngram(data, n):
    res = []
    for i in xrange(len(data) - 1):
        res.append(data[i:i + n])
    return res

string1 = 'paraparaparadise'
string2 = 'paragraph'

X = ngram(string1, 2)
Y = ngram(string2, 2)

print u"Union:"
print list(set(X).union(set(Y)))
print u"Intersection:"
print list(set(X).intersection(set(Y)))
print u"Difference set:"
print list(set(X).difference(set(Y)))

print u'\'se\'Is included in X?'
print "se" in X
print u'\'se\'Is included in Y?'
print "se" in Y
#=>Union:
#=> ['ad', 'ag', 'di', 'is', 'ap', 'pa', 'ra', 'ph', 'ar', 'se', 'gr']
#=>Intersection:
#=> ['ap', 'pa', 'ar', 'ra']
#=>Difference set:
#=> ['is', 'ad', 'se', 'di']
#=> 'se'Is included in X?
#=> True
#=> 'se'Is included in Y?
#=> False

The union, intersection, and difference set of bi-gram are obtained by using the set method.

</ i> 07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

def func(x, y, z):
    return u"%s time%s is%s" % (x, y, z)

x = 12
y = u"temperature"
z = 22.4
print func(x, y, z)
#=>The temperature at 12:00 is 22.4

Use the format specification of the print statement for the template.

</ i> 08. Ciphertext

Implement the function cipher that converts each character of the given character string with the following specifications. Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

def cipher(data):
    res = ""
    for s in data:
        if s.islower():
            res += chr(219-ord(s))
        else:
            res += s
    return res

string = "re1"
print u'encryption:'
print cipher(string)
print u'Decryption:'
print cipher(cipher(string))
#=>encryption:
#=> iv1
#=>Decryption:
#=> re1

Whether it is lowercase or not is determined by the islower method. There is no need to implement a special compound because it has the property that it will be restored when the encrypted one is converted again.

09. Typoglycemia

Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

import random

def typoglycemia(data):
    res = []
    for d in data.split():
        if len(d) > 4:
            pre = d[0]
            suf = d[-1]
            word = list(d[1:-1])
            random.shuffle(word)
            res.append(pre + "".join(word) + suf)
        else:
            res.append(d)
    return " ".join(res)

sentence = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print typoglycemia(sentence)
#=> I cuodn'lt beevile that I cuold alluacty usdtanrend what I was reidang : the pmeenhnaol peowr of the huamn mind .

The inside of the string is shuffled using the random shuffle method. Of course, the sorting is random, so the output is different each time.

Recommended Posts