I don't know how much I can do. (cf. Knock 100 language processing 2015)
# coding: utf-8
s = "stressed"
print(s[::-1])
# coding: utf-8
s = "Patatoku Kashii"
print(s[::2])
# coding: utf-8
s1 = 'Police car'
s2 = 'taxi'
s = ''.join([i+j for i, j in zip(s1, s2)])
print(s)
You can create an iterator from multiple iterables with zip ()
.
Postscript
# coding: utf-8
s1 = 'Police car'
s2 = 'taxi'
s = ''.join(i+j for i, j in zip(s1, s2))
print(s)
It is not necessary to create an unnecessary list by using the generator comprehension notation.
# coding: utf-8
import re
s = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
#Exclude commas and commas and then break down into a word-by-word list
s = re.sub('[,.]', '', s)
s = s.split()
#Count the number of characters and list
result = []
for w in s:
result.append(len(w))
print(result)
Postscript
# coding: utf-8
s = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
#After breaking it down into a word-by-word list,.Exclude and count the number of characters
result = [len(w.rstrip(',.')) for w in s.split()]
print(result)
Processing such as initializing with an empty list and turning for is rewritten with comprehension notation. rstrip ()
removes the specified character from the right. Characters can be specified together.
# coding: utf-8
s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'.split()
target = [1, 5, 6, 7, 8, 9, 15, 16, 19]
result ={}
for i in range(len(s)):
if i + 1 in target:
result[i+1] = s[i][:1]
else:
result[i+1] = s[i][:2]
print(result)
Postscript
# coding: utf-8
s = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
target = 1, 5, 6, 7, 8, 9, 15, 16, 19
result = [w[: 1 if i in target else 2] for i, w in enumerate(s.split(), 1)]
print(result)
At first I wondered if else could be used in the inclusion notation of the [: 1 if i in target else 2]
part, but this is the slice [: x]
and the ternary operator ʻa if cond else b. A combination of `.
# coding: utf-8
def n_gram(n, s):
result = []
for i in range(0, len(s)-n+1):
result.append(s[i:i+n])
return result
print(n_gram(2, 'I am an NLPer'))
Postscript
# coding: utf-8
def n_gram(n, s):
return [s[i:i+n] for i in range(0, len(s)-n+1)]
print(n_gram(2, 'I am an NLPer'))
This is also for
in the initialized list, so it can be rewritten in the inclusion notation. I'm not used to the comprehension, so I have the impression that it's difficult to write it in for and rewrite it.
# coding: utf-8
#Union, intersection, and difference of two bigrams
def bi_gram(s):
result = []
for i in range(0, len(s)-1):
result.append(s[i:i+2])
return result
s1 = 'paraparaparadise'
s2 = 'paragraph'
X = set(bi_gram(s1))
Y = set(bi_gram(s2))
print("X = ", X)
print("Y = ", Y)
print("union: ", X | Y) # union
print("intersection: ", X & Y) # intersection
print("difference: ", X - Y) # difference
if "se" in X:
print("X contain 'se'.")
else:
print("X doesn't contain 'se'.")
if "se" in Y:
print("Y contain 'se'.")
else:
print("Y doesn't contain 'se'.")
# coding: utf-8
def gen_sentence(x, y, z):
return "{}of time{}Is{}".format(x, y, z)
x = 12
y = 'temperature'
z = 22.4
print(gen_sentence(x, y, z))
# coding: utf-8
def cipher(S):
result = []
for i in range(len(S)):
if(S[i].islower()):
result.append(chr(219 - ord(S[i])))
else:
result.append(S[i])
return ''.join(result)
S = "abcDe"
print(cipher(S))
print(cipher(cipher(S)))
Convert characters to Unicode code point integers with ʻord () . The inverse function is
chr ()`.
Postscript
# coding: utf-8
def cipher(S):
return ''.join(chr(219 - ord(c)) if c.islower() else c for c in S)
S = "abcDe"
print(cipher(S))
print(cipher(cipher(S)))
Those that conditionally branch to the initialized list and appendwith
for` can be replaced by ternary operators and comprehensions. If you know that it is such a thing, it seems that you can read it and write it.
# coding: utf-8
import numpy.random as rd
def gen_typo(S):
if len(S) <= 4:
return S
else:
idx = [0]
idx.extend(rd.choice(range(1, len(S)-1), len(S)-2, replace=False))
idx.append(len(S)-1)
result = []
for i in range(len(S)):
result.append(S[idx[i]])
return ''.join(result)
s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
s = s.split()
print(' '.join([gen_typo(i) for i in s]))
There are various random sampling methods, but you can use numpy.random.choice
to choose between restore extraction and non-restore extraction.
Postscript
# coding: utf-8
import random
def gen_typo(S):
return ' '.join(
s
#Returns words less than 4 in length
if len(s) <= 4
#Shuffle words 5 or longer, leaving the first and second letters
else s[0] + ''.join(random.sample(s[1:-1], len(s)-2)) + s[-1]
for s in S.split()
)
S = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(gen_typo(S))
This is also rewritten with comprehensions and ternary operators.
Also, random.sample (population, k)
randomly unrestores and extracts k
elements from population
(sequence or set).
Recommended Posts