This is the first chapter of 100 knocks on language processing.
The environment is Windows 10, python 3.6.0. I referred to here.
Get a string with the letters "stressed" reversed.
# coding: utf-8
target = "stressed"
new_target = target[::-1]
print(new_target)
desserts
default is 0,8 when step is positive, 8,0 when step is negative
Take out the 1st, 3rd, 5th, and 7th characters of the character string "Patatokukashi" and get the concatenated character string.
# coding: utf-8
word = "Patatoku Kashii"
new_word = word[::2]
print(new_word)
Police car
** Don't forget u. ** **
Get the character string "Patatokukashi" by alternately connecting the characters "Police car" + "Taxi" from the beginning.
# cording utf-8
word1 = u"Police car"
word2 = u"taxi"
mix_word = ""
for w1,w2 in zip (word1,word2):
mix_word += w1 + w2
print(mix_word)
Patatoku Kashii
--Zip to fit the longer one
import itertools
target1 = '12345'
target2 = 'abc'
zipped = itertools.zip_longest(target1,target2)
print(list(zipped))
[('1', 'a'), ('2', 'b'), ('3', 'c'), ('4', None), ('5', None)]
--Set the value to something other than None
import itertools
target1 = '12345'
target2 = 'abc'
zipped = itertools.zip_longest(target1,target2,fillvalue = False )
print(list(zipped))
[('1', 'a'), ('2', 'b'), ('3', 'c'), ('4', False), ('5', False)]
--If you zip () it again, it will return to the original.
import itertools
target1 = '12345'
target2 = 'abc'
zipped = itertools.zip_longest(target1,target2,fillvalue = False )
zipped_list = list(zipped)
zizipped = zip(zipped_list[0],zipped_list[1],zipped_list[2],zipped_list[3],zipped_list[4])
print(list(zizipped))
[('1', '2', '3', '4', '5'), ('a', 'b', 'c', False, False)]
-Ver using *
import itertools
target1 = '12345'
target2 = 'abc'
zipped = itertools.zip_longest(target1,target2,fillvalue = False )
zipped_list = list(zipped)
zizipped = zip(*zipped_list)
print(list(zizipped))
[('1', '2', '3', '4', '5'), ('a', 'b', 'c', False, False)]
Break down the sentence "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics." Into words, and create a list of the number of characters (in the alphabet) of each word in order of appearance.
words = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
result = []
new_words = words.translate(str.maketrans("","",",."))
for word in new_words.split(' '):
word_length = len(word)
result.append(word_length)
print(result)
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
strip removes the specified string from both ends.
str.translate(str.maketrans("","",".,"))
1st → 2nd 3rd factor is the character string you want to delete.
--Beautiful answer
words = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
result = [len(word.strip(",.")) for word in words.split(" ")]
print(result)
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
import re
words = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
result = [len(word) for word in (re.sub(r"[,.]","",words).split(" "))]
print(result)
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
Break down the sentence "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can." Into words 1, 5, 6, 7, 8, 9, 15, 16, 19 The first word is the first character, the other words are the first two characters, and the associative array (dictionary type or map type) from the extracted character string to the word position (what number of words from the beginning) is created. Create it.
sentence = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
words = [word.strip(',.') for word in sentence.split()]
dic = {word[0]:words.index(word) + 1 for word in words if words.index(word) in (0,4,5,6,7,8,14,15,18)}
dic.update({word[:2]:words.index(word) + 1 for word in words if words.index(word) not in (0,4,5,6,7,8,14,15,18)})
print(dic)
{'H': 1, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'P': 15, 'S': 16, 'K': 19, 'He': 2, 'Li': 3, 'Be': 4, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'Cl': 17, 'Ar': 18, 'Ca': 20}
--Another solution 1
sentence = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
words = [word.strip(',.') for word in sentence.split()]
link = {}
for i,v in enumerate(words,1):
length = 1 if i in [1,5,6,7,8,9,15,16,19] else 2
link.update({v[:length]:i})
print(link)
{'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17, 'Ar': 18, 'K': 19, 'Ca': 20}
--Another solution 2
sentence ="Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
#First character(Or 2 letters)And create a dictionary that associates the index of that word
link = {w[:2-(i in (1,5,6,7,8,9,15,16,19))]:i for i,w in enumerate(sentence.split(),1)}
print(link)
{'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17, 'Ar': 18, 'K': 19, 'Ca': 20}
Use of Boolean values True = 1 False = 0
import re
sentence_string = "I am an NLPer"
sentence_list = sentence_string.split()
def n_gram(sequence,n):
u"""Character bi when passed as a string-gram,The word bi when passed as a list-Treated as a gram.
"""
result = []
if isinstance(sequence,str):
sequence = list(re.sub("[,. ]","",sequence))
for i in range(len(sequence)- n+1):
result.append('-'.join(sequence[i:i+n]))
return result
print(n_gram(sentence_string,2))
print(n_gram(sentence_list,2))
['I-a', 'a-m', 'm-a', 'a-n', 'n-N', 'N-L', 'L-P', 'P-e', 'e-r']
['I-am', 'am-an', 'an-NLPer']
--Comprehension version (from shiracamus)
import re
sentence_string = "I am an NLPer"
sentence_list = sentence_string.split()
def n_gram(sequence, n):
u"""Character bi when passed as a string-gram,The word bi when passed as a list-Treated as a gram.
"""
if isinstance(sequence, str):
sequence = list(re.sub("[,. ]", "", sequence))
return ['-'.join(sequence[i:i+n])
for i in range(len(sequence) - n + 1)]
print(n_gram(sentence_string, 2))
print(n_gram(sentence_list, 2))
You can use the inclusion notation. ..
Find the set of character bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.
import re
X = "paraparaparadise"
Y = "paragraph"
def n_gram(sequence,n):
result = []
if isinstance(sequence,str):
sequence = list(re.sub("[,. ]","",sequence))
for i in range(len(sequence)- n+1):
result.append('-'.join(sequence[i:i+n]))
return result
X = (set(n_gram(X,2)))
Y = (set(n_gram(Y,2)))
print("X:",X)
print("Y:",Y)
print("Union:",X | Y)
print("Intersection:",X & Y)
print("Difference set 1:",X - Y)
print("Difference set 2:",Y - X)
if 's-e' in X:
print('se is included in X')
if 's-e' in Y:
print('se is included in Y')
X: {'a-d', 'a-r', 'r-a', 'i-s', 's-e', 'd-i', 'p-a', 'a-p'}
Y: {'a-r', 'r-a', 'p-h', 'g-r', 'a-g', 'p-a', 'a-p'}
Union: {'a-r','i-s','p-a','a-p','a-d','r-a','p-h','g-r ',' s-e',' d-i','a-g'} Intersection: {'p-a','a-p','a-r','r-a'} Difference set 1: {'a-d','d-i','s-e','i-s'} Complement 2: {'a-g','g-r','p-h'} se is included in X
Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.
def make_sentence(x,y,z):
print(u"{0}of time{1}Is{2}".format(x,y,z))
make_sentence(x = 12,y = "temperature",z = 22.4)
The temperature at 12:00 is 22.4
Implement the function cipher that converts each character of the given character string according to the following specifications. --If lowercase letters, replace with (219 --character code) characters --Other characters are output as they are Use this function to encrypt / decrypt English messages.
import re
pat = re.compile(u"[a-z]")
def cipher(string):
return ''.join(chr(219-ord(c)) if pat.match(c) else c for c in string)
if __name__ == "__main__":
sentence = u"Hello world!"
ciphertext = cipher(sentence)
print(sentence)
print(ciphertext)
print(cipher(ciphertext))
re.compile('[a-z]')
Hello world!
Hvool dliow!
Hello world!
--Regular expression non-use version (from shiracamus)
def cipher(string):
return ''.join(chr(219 - ord(c)) if c.islower() else c for c in string)
if __name__ == "__main__":
sentence = u"Hello world!"
ciphertext = cipher(sentence)
print(sentence)
print(ciphertext)
print(cipher(ciphertext))
You can use str.islower (). str.islower () seems to be True
even if it is a character string that is not case sensitive.
chr(i) Returns a string that represents a character whose Unicode code point is the integer i. For example, chr (97) returns the string'a' and chr (8364) returns the string'€'. The opposite of ord ().
The valid range of arguments is 0 to 1,114,111 (0x10FFFF in hexadecimal). ValueError is raised if i is out of range.
from random import shuffle
def change_order(sentence):
produced_word_list = []
word_list = sentence.split(' ')
for word in word_list:
if len(word) <= 4:
produced_word_list.append(word)
else:
middle = list(word[1:-1])
shuffle(middle)
produced_word = word[0] + ''.join(middle) + word[-1]
produced_word_list.append(produced_word)
return ' '.join(produced_word_list)
sentence = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(change_order(sentence))
I cnud'olt bvieele that I cluod aucltaly uestradnnd what I was reading : the pnemonaehl power of the huamn mind .
--Another solution
import random
def change_order(sentence):
produced_word_list = []
word_list = sentence.split(' ')
for word in word_list:
if len(word) <= 4:
produced_word_list.append(word)
else:
middle_list = list(word[1:-1])
new_middle = ''
while len(middle_list) > 0:
rnd = random.randint(0,len(middle_list)-1)
new_middle += middle_list.pop(rnd)
new_word = word[0] + new_middle + word[-1]
produced_word_list.append(new_word)
return ' '.join(produced_word_list)
sentence = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(change_order(sentence))
I cl'oundt beevile that I culod aauctlly unnetdarsd what I was rdeaing : the pneoenhaml peowr of the haumn mind .
--Generator and random.shuffle version (from shiracamus)
import random
def change_order(sentence):
def produced_words():
word_list = sentence.split()
for word in word_list:
if len(word) <= 4:
yield word
else:
middle = list(word[1:-1])
random.shuffle(middle)
yield word[0] + ''.join(middle) + word[-1]
return ' '.join(produced_words())
sentence = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(change_order(sentence))
I wasn't very good at generators, but since it's a big deal, Classes and Iterators --Dive Into Python 3 Japanese Version Python iterator I tried to understand with reference to. ..
The generator is
Is it okay to recognize that the for statement contains the next ()
and ʻiter ()methods and calls the
iterand
nextmethods in the generator function (?)? .. The iterator object has two methods, The
iter method returns itself, and the
next` method returns the next element.
Note from Classes and Iterators --Dive Into Python 3 Japanese Version
>>> import plural6
>>> r1 = plural6.LazyRules()
>>> r2 = plural6.LazyRules()
>>> r1.rules_filename ①
'plural6-rules.txt'
>>> r2.rules_filename
'plural6-rules.txt'
>>> r2.rules_filename = 'r2-override.txt' ②
>>> r2.rules_filename
'r2-override.txt'
>>> r1.rules_filename
'plural6-rules.txt'
>>> r2.__class__.rules_filename ③
'plural6-rules.txt'
>>> r2.__class__.rules_filename = 'papayawhip.txt' ④
>>> r1.rules_filename
'papayawhip.txt'
>>> r2.rules_filename ⑤
'r2-overridetxt'
(1) Each instance of this class inherits the attribute rules_filename that has the value defined in the class. (2) Changing the attribute value of one instance does not affect the attribute value of other instances ... ③ …… Do not change the class attributes. You can refer to class attributes (rather than the attributes of individual instances) by using the special attribute class to access the class itself. (4) When the class attribute is changed, the instance (here, r1) that still inherits the value is affected. ⑤ The instance that overwrites the attribute (r2 in this case) is not affected.
Individual instances and class instances are different.
sample1.py
class MyIterator(object):
def __init__(self, *numbers):
self._numbers = numbers
self._i = 0
def __iter__(self):
# next()Is implemented by self, so it returns self as it is
return self
def next(self):
if self._i == len(self._numbers):
raise StopIteration()
value = self._numbers[self._i]
self._i += 1
return value
my_iterator = MyIterator(10, 20, 30)
for num in my_iterator:
print 'hello %d' % num
sample2.py
class Fib:
'''iterator that yields numbers in the Fibonacci sequence'''
def __init__(self, max):
self.max = max
def __iter__(self):
self.a = 0
self.b = 1
return self
def __next__(self):
fib = self.a
if fib > self.max:
raise StopIteration
self.a, self.b = self.b, self.a + self.b
return fib
Why is there a case where next
is a special method __next__
and a case where it is a normal method next
in Class?
The __next__
in the Class is called from the external methodnext
, while the next
in the Class is executed in the Class (?)
I don't understand.
Recommended Posts