100 Language Processing with Python Knock 2015

I'm going to do "100 Language Processing Knock 2015" for studying programs and Python! http://www.cl.ecei.tohoku.ac.jp/nlp100/

For the time being, I'm thinking of updating it one by one. (Maybe it will be quite slow ...) I will post it on GitHub one by one.

tanaka0325/nlp100

rule

First, put the code you wrote yourself. After that, I'll look it up and post the writing style that I thought was better.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

My code

I think there is a short way to write it, I couldn't think of it because I was too inexperienced, so I decided to use a very muddy method ...

`knock000.py`


s = "stressed"

def reverse(s: str) -> str:
	list = []

	for char in s:
		list.insert(0, char)

	return "".join(list)

print(reverse(s))

The code I looked up

`python`


s = "stressed"
print(s[::-1])

It seems that I can write it so short. .. ..

Review of string slices

`python`


str = "I have a dream!"

#Uninflected str[the first:last]
str[0:6]
#=> 'I have'

#The specification can be omitted (if omitted, all, or up to the end?)
str[:10]
#=> 'I have a d'

str[3:]
#=> 'ave a dream!'

#You can also specify the number of steps str[the first:last:Number of steps]
str[0:12:3]
#=> 'Ia d'

`python`


s[::-1]

This means "from the beginning to the end one by one from the opposite" = reverse order.

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

My code

Although the numerical value is specified directly

`knock001.py`


s = "Patatoku Kashii"
print(s[0] + s[2] + s[4] + s[6])

All of them are even numbers, so if you turn them in a loop and they are even numbers, is it correct to write them in a similar way? ??

The code I looked up

None this time

Postscript: I will add it because you pointed out in the comment. As with 00., You can specify the number of steps and write as follows in slices.

`python`


s = "Patatoku Kashii"
print(s[::2])

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

My code

`knock002.py`


s1 = "Police car"
s2 = "taxi"
s3 = ""

for i in range(len(s1)):
    s3 = s3 + s1[i] + s2[i]

print(s3)

I feel that the argument len (s1) of range is not cool. Since the given character string has the same number of characters, it feels okay.

The code I looked up

None this time (I feel that this method is not good, so I will set this item only if there is something I checked.)

Postscript: It seems that you can use the zip function at len (s1)

`python`


for char1, char2 in zip(s1, s2):
    s3 = s3 + char1 + char2

The zip function processes multiple arguments in a loop at the same time.

Addendum Addendum: You pointed out in the comments.

`python`


for char1, char2 in zip(s1, s2):
    s3 = s3 + char1 + char2

If you connect the strings as above, it will be slow because it will allocate new memory every time it loops. So, it seems better to make a list of strings and then rub it with join at the end.

What you taught me below

`python`


print(''.join([char1 + char2 for char1, char2 in zip(s1, s2)]))

03. Pi

My code

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

`python`


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

words = s.split(" ")
char_count = []

for word in words:
    char_count.append(len(word))

print(char_count)

Postscript: You pointed out in the comments. It seems that you can write concisely if you use list comprehension notation And since the default of the split function is blank, it seems that you do not have to specify the blank one by one. In addition, it seems that you have to remove, and. .. ..

I will list what you told me in the comments below However, it seems that the specifications per translate have changed in python3.4, so I rewrote it as follows

`python`


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

print([len(word.translate(word.maketrans({".":None,",":None}))) for word in s.split()])

I can't write it well, so another method is to specify a regular expression and replace it (delete this time).

`python`


import re

s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

pat = re.compile('[.,]')
print([len(pat.sub('', word)) for word in s.split()])

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters. Create.

My code

`python`


s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

s = s.replace(".", "")
words = s.split(" ")
words_index = {}

for i, word in enumerate(words):
    if i in [1, 5, 6, 7, 8, 9, 15, 16, 19]:
        words_index[word[:1]] = i
    else:
        words_index[word[:2]] = i

print(words_index)

Postscript: I received a comment pointing out that it was wrong

The following modified version

`python`


s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

s = s.replace(".", "")
words = s.split(" ")
words_index = {}

for i, word in enumerate(words):
    n = i + 1
    if n in [1, 5, 6, 7, 8, 9, 15, 16, 19]:
        words_index[word[:1]] = n
    else:
        words_index[word[:2]] = n

print(words_index)

After that, the assignment in for is redundant, so the following writing method given in the comment is more concise.

`python`


for i, word in enumerate(words):
    n = i + 1
    l = 1 if n in (1, 5, 6, 7, 8, 9, 15, 16, 19) else 2
    words_index[word[:l]] = n