[PYTHON] 100 natural language processing knocks Chapter 1 Preparatory movement (first half)

Solved "Natural Language Processing 100 Knock 2015" published on the Tohoku University Inui-Okazaki Laboratory Web page. I will go, so I will summarize it as a record.

</ i> 00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

The specified character strings may be displayed in the reverse order. The keyword is "Extended Slices".

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

string = "stressed"
print string[::-1]
#=> desserts

The point is [:: -1]. I think it's a good idea to write this when you get the inversion of a string. You can see why this happens by looking at the relevant part of the official python documentation. I will omit the explanation.

</ i> 01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

This can also be solved with "Extended Slices".

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

string = u"Patatoku Kashii"
print string[::2]
#=>Police car

It is displayed by slicing from the beginning to the end of the character string in 2-character increments.

</ i> 02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

This problem uses the "zip" function.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

string1 = u"Police car"
string2 = u"taxi"
ans = u""

for a, b in zip(string1, string2):
    ans += a + b
print ans
#=>Patatoku Kashii

Since the zip function is a function that returns the i-th element of each sequence as a tuple, it is possible to handle two sequence objects in parallel by using the zip function.

</ i> 03. Pi

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

string = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
ans = []

for s in string.split():
    s = s.replace(",", "").replace(".", "")
    ans.append(len(s))
print ans
#=> [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Use the split () method to slice the target with the specified character. If nothing is specified, whitespace delimiters are applied. Since , and . are in the way, replace them before counting the number of characters.

</ i> 04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) Create.

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

string = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
index = [1, 5, 6, 7, 8, 9, 15, 16, 19]
ans = {}
for i, s in enumerate(string.split(), 1):
    if i in index:
        ans[s[0]] = i
    else:
        ans[s[0:2]] = i
print ans
#=> {'Be': 4, 'C': 6, 'B': 5, 'Ca': 20, 'F': 9, 'S': 16, 'H': 1, 'K': 19, 'Al': 13, 'Mi': 12, 'Ne': 10, 'O': 8, 'Li': 3, 'P': 15, 'Si': 14, 'Ar': 18, 'Na': 11, 'N': 7, 'Cl': 17, 'He': 2}

For the specified word, two characters are extracted by slicing. You can specify the start value of the counter by specifying the second argument of the enumerate function ([[reference]] [[reference]](http://docs.python.jp/2/library/functions.html# which seems to be a function from python2.6) enumerate)).

Recommended Posts