100 Language Processing Knock Chapter 1 by Python

Recently, I had to study Python, so I tried knocking 100 language processes. First of all, from Chapter 1: Preparatory Movement.

Language processing 100 knocks

http://www.cl.ecei.tohoku.ac.jp/nlp100/

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

q00='stressed'
print(q00[::-1])

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

q01='Patatoku Kashii'
#print(q01[1]+q01[3]+q01[5]+q01[7])

# ->updated version
print(q01[1::2])

02. "Police car" + "Taxi" = "Patatokukashi"

Get the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

Solution 1

q021='Police car'
q022='taxi'

length=min(len(q021),len(q022))

ansq02=''
for i in range(length):
    temp=q021[i]+q022[i]
    ansq02+=temp

print(ansq02)

Solution 2

q021='Police car'
q022='taxi'

ansq022="".join(i+j for i,j in zip(q021,q022))

print(ansq022)

03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

q03="Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

ansq03=[len(i.strip(",.")) for i in q03.split()]

print(ansq03)

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, 19 The first word is the first character, the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) is created. Create it.

q04="Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

dict={}

q04_list=[(i.strip(",.")) for i in q04.split()]
print(q04_list)

q04_listNum=[1, 5, 6, 7, 8, 9, 15, 16, 19]

for idx,val in enumerate(q04_list):
    temp_char=val
    idx += 1
    if ((idx) in q04_listNum):
        dict[temp_char[0]] = idx
    else:
        dict[temp_char[:2:1]] =idx

print(dict)

n-gram Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

q05="I am an NLPer"

# bi-gram for char
char_bigram=[q05[i:i+2] for i in range(len(q05)-1)]
print(char_bigram)

# n-bigram for words
words=[(i.strip(".,")) for i in q05.split()]
words_bigram=["-".join(words[i:i+2]) for i in range(len(words)-1)]
print(words_bigram)

06. Meeting

Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.

import copy

def bigram(a):
    result=[a[i:i+2] for i in range(len(a)-1)]
    return result

q061="paraparaparadise"
q062="paragraph"

bigramX_list = copy.deepcopy(bigram(q061))
bigramY_list = copy.deepcopy(bigram(q062))

bigramX_set=set(bigramX_list)
bigramY_set=set(bigramY_list)
print ('bigramX_set =', bigramX_set)
print ('bigramY_set =', bigramY_set)

#Union
print ('Union= ',  (bigramX_set | bigramY_set))
#Difference set
print ('Difference set= ',  (bigramX_set - bigramY_set))
#Intersection
print ('Intersection= ',  (bigramX_set & bigramY_set))
#Search
print ('search results= ', 'se' in  (bigramX_set | bigramY_set))

07. Sentence generation by template

Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.

def maketext(x=1,y='Anko',z=10):
    result="".join(str(x)+'of time'+y+'Is'+str(z))
    return result

x,y,z=12,'temperature',22.4

print (maketext(x,y,z))
#print (maketext())

08. Ciphertext

Implement the function cipher that converts each character of the given character string according to the following specifications.

Replace with (219 --character code) characters in lowercase letters Output other characters as they are Use this function to encrypt / decrypt English messages.

#Q08
def cipher(a):
    temp_list=[a[i:i+1] for i in range(len(a))]
    ciptex_list=[]
    for i in temp_list:
 
        texCode=ord(i)
        if (texCode>96 & texCode<123):
            updtexCode=chr(219-texCode)
        else:
            updtexCode=chr(texCode)

        ciptex_list.append(updtexCode)

    result="".join(i for i in ciptex_list)
    return result

print (cipher('abcdef')) #=> 'zyxwyu'

Typoglycemia Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.

import random
def randsort(a):

    result = []
    listA = [(i.strip(',.')) for i in a.split()]
    randchar = lambda x: ''.join(random.sample(x,len(x)))

    for i in listA:
        if len(i) > 4:
            temp_word=i[:1:1]+randchar(i[1:len(i)-1:1])+i[len(i)-1::1]
            result.append(temp_word)
        else:
            result.append(i)
    return (result)

q09="I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."

print(randsort(q09))

For the time being, I tried to write Python myself for the first time so far, but I did a lot of research and learned a lot. There may be other more efficient ways to do it, but for now I'm going to do it.

reference

"Getting the code value of a character" / "Getting a character from a code value" in Python http://d.hatena.ne.jp/flying-foozy/20111204/1323009984

Unicode HOWTO https://docs.python.jp/3/howto/unicode.html

Python: Compare two list elements with a set type set operation http://www.yukun.info/blog/2008/08/python-set-list-comparison.html

3.7 set type --set, frozenset http://docs.python.jp/2.5/lib/types-set.html