100 Language Processing with Python Knock 2015

I'm going to do "100 Language Processing Knock 2015" for studying programs and Python! http://www.cl.ecei.tohoku.ac.jp/nlp100/

For the time being, I'm thinking of updating it one by one. (Maybe it will be quite slow ...) I will post it on GitHub one by one.

rule

First, put the code you wrote yourself. After that, I'll look it up and post the writing style that I thought was better.

Chapter 1: Preparatory movement

00. Reverse order of strings

Get a string in which the characters of the string "stressed" are arranged in reverse (from the end to the beginning).

My code

I think there is a short way to write it, I couldn't think of it because I was too inexperienced, so I decided to use a very muddy method ...

knock000.py


s = "stressed"

def reverse(s: str) -> str:
	list = []

	for char in s:
		list.insert(0, char)

	return "".join(list)

print(reverse(s))

The code I looked up

python


s = "stressed"
print(s[::-1])

It seems that I can write it so short. .. ..

Review of string slices

python


str = "I have a dream!"

#Uninflected str[the first:last]
str[0:6]
#=> 'I have'

#The specification can be omitted (if omitted, all, or up to the end?)
str[:10]
#=> 'I have a d'

str[3:]
#=> 'ave a dream!'

#You can also specify the number of steps str[the first:last:Number of steps]
str[0:12:3]
#=> 'Ia d'

So

python


s[::-1]

This means "from the beginning to the end one by one from the opposite" = reverse order.

01. "Patatokukashi"

Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.

My code

Although the numerical value is specified directly

knock001.py


s = "Patatoku Kashii"
print(s[0] + s[2] + s[4] + s[6])

All of them are even numbers, so if you turn them in a loop and they are even numbers, is it correct to write them in a similar way? ??

The code I looked up

None this time

Postscript: I will add it because you pointed out in the comment. As with 00., You can specify the number of steps and write as follows in slices.

python


s = "Patatoku Kashii"
print(s[::2])

02. "Police car" + "Taxi" = "Patatokukashi"

Obtain the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.

My code

knock002.py


s1 = "Police car"
s2 = "taxi"
s3 = ""

for i in range(len(s1)):
    s3 = s3 + s1[i] + s2[i]

print(s3)

I feel that the argument len (s1) of range is not cool. Since the given character string has the same number of characters, it feels okay.

The code I looked up

None this time (I feel that this method is not good, so I will set this item only if there is something I checked.)

Postscript: It seems that you can use the zip function at len (s1)

python


for char1, char2 in zip(s1, s2):
    s3 = s3 + char1 + char2

The zip function processes multiple arguments in a loop at the same time.

Addendum Addendum: You pointed out in the comments.

python


for char1, char2 in zip(s1, s2):
    s3 = s3 + char1 + char2

If you connect the strings as above, it will be slow because it will allocate new memory every time it loops. So, it seems better to make a list of strings and then rub it with join at the end.

What you taught me below

python


print(''.join([char1 + char2 for char1, char2 in zip(s1, s2)]))

03. Pi

My code

Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.

python


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

words = s.split(" ")
char_count = []

for word in words:
    char_count.append(len(word))

print(char_count)

Postscript: You pointed out in the comments. It seems that you can write concisely if you use list comprehension notation And since the default of the split function is blank, it seems that you do not have to specify the blank one by one. In addition, it seems that you have to remove, and. .. ..

I will list what you told me in the comments below However, it seems that the specifications per translate have changed in python3.4, so I rewrote it as follows

python


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

print([len(word.translate(word.maketrans({".":None,",":None}))) for word in s.split()])

I can't write it well, so another method is to specify a regular expression and replace it (delete this time).

python


import re

s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

pat = re.compile('[.,]')
print([len(pat.sub('', word)) for word in s.split()])

04. Element symbol

Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, The 19th word is the first character, and the other words are the first two characters. Create.

My code

python


s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

s = s.replace(".", "")
words = s.split(" ")
words_index = {}

for i, word in enumerate(words):
    if i in [1, 5, 6, 7, 8, 9, 15, 16, 19]:
        words_index[word[:1]] = i
    else:
        words_index[word[:2]] = i

print(words_index)

Postscript: I received a comment pointing out that it was wrong

The following modified version

python


s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

s = s.replace(".", "")
words = s.split(" ")
words_index = {}

for i, word in enumerate(words):
    n = i + 1
    if n in [1, 5, 6, 7, 8, 9, 15, 16, 19]:
        words_index[word[:1]] = n
    else:
        words_index[word[:2]] = n

print(words_index)

After that, the assignment in for is redundant, so the following writing method given in the comment is more concise.

python


for i, word in enumerate(words):
    n = i + 1
    l = 1 if n in (1, 5, 6, 7, 8, 9, 15, 16, 19) else 2
    words_index[word[:l]] = n

Recommended Posts

100 Language Processing with Python Knock 2015
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 Language Processing Knock Chapter 1 in Python
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
100 Language Processing Knock Chapter 1 by Python
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
100 language processing knock 2020 [00 ~ 39 answer]
Image processing with Python 100 knock # 10 median filter
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 Chapter 1
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
100 Amateur Language Processing Knock: 17
100 language processing knock 2020 [00 ~ 49 answer]
Python: Natural language processing
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
100 Amateur Language Processing Knock: 07
3. Natural language processing with Python 2-1. Co-occurrence network
Image processing with Python 100 knock # 12 motion filter
3. Natural language processing with Python 1-1. Word N-gram
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
Image processing with Python
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 Language Processing Knock-88: 10 Words with High Similarity
Getting started with Python with 100 knocks on language processing
100 language processing knock-90 (using Gensim): learning with word2vec
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
Python inexperienced person tries to knock 100 language processing 14-16
[Python] I played with natural language processing ~ transformers ~
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
100 language processing knock-95 (using pandas): Rating with WordSimilarity-353
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
Image processing with Python (Part 2)
100 Language Processing Knock-51: Word Clipping
"Apple processing" with OpenCV3 + Python3
100 Language Processing Knock-58: Tuple Extraction
100 Language Processing Knock-57: Dependency Analysis
Acoustic signal processing with Python (2)
Acoustic signal processing with Python
100 language processing knock-50: sentence break
Image processing with Python (Part 1)
Image processing with Python (Part 3)
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity