I wrote the code in the continuation of the last time. I shortened the part that is easy to see and can be written as short as possible. Please point out any mistakes or improvements.
(Addition) Since you pointed out how to write the code, I added after editing 05, 06 and 09.
Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".
05.py
def word_ngram(seq, n):
return ["-".join(seq.split()[i:i+n]) for i in range(len(seq.split())-n+1)]
def char_ngram(seq, n):
return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]
def main():
seq = "I am an NLPer"
word_2gram_list, char_2gram_list = word_ngram(seq, 2), char_ngram(seq, 2)
print(word_2gram_list)
print(char_2gram_list)
if __name__ == '__main__':
main()
['I-am', 'am-an', 'an-NLPer']
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']
The word bi-gram is displayed separated by a hyphen, and the character bi-gram is displayed with spaces.
(After editing ↓) Word_ngram was rewritten.
05.py
def word_ngram(seq, n):
words = seq.split()
return ["-".join(words[i:i+n]) for i in range(len(words)-n+1)]
def char_ngram(seq, n):
return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]
def main():
seq = "I am an NLPer"
word_2gram_list, char_2gram_list = word_ngram(seq, 2), char_ngram(seq, 2)
print(word_2gram_list)
print(char_2gram_list)
if __name__ == '__main__':
main()
Find the set of characters bi-grams contained in "paraparaparadise" and "paragraph" as X and Y, respectively, and find the union, intersection, and complement of X and Y, respectively. In addition, find out if the bi-gram'se'is included in X and Y.
06.py
# coding:utf-8
def n_gram(seq, n):
return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]
def main():
X, Y = set(n_gram("paraparaparadise", 2)), set(n_gram("paragraph", 2))
print("X: " + str(X))
print("Y: " + str(Y))
print("Union: " + str(X.union(Y)))
print("Intersection: " + str(X.intersection(Y)))
print("Difference set: " + str(X.difference(Y)))
print("Does X contain se?: " + str("se" in X))
print("Does Y include se?: " + str("se" in Y))
if __name__ == '__main__':
main()
X: {'pa', 'ar', 'di', 'se', 'ad', 'ap', 'is', 'ra'}
Y: {'pa', 'ar', 'ph', 'ap', 'ag', 'gr', 'ra'}
Union: {'di', 'se', 'ap', 'ag', 'pa', 'ar', 'ph', 'ad', 'is', 'gr', 'ra'}
Intersection: {'ap', 'ar', 'ra', 'pa'}
Difference set: {'is', 'ad', 'di', 'se'}
Does X contain se?: True
Does Y include se?: False
For n_gram, I used char_ngram of 05.py as it is.
(After editing ↓) The print part was rewritten.
06.py
# coding:utf-8
def n_gram(seq, n):
return ["".join(seq[i:i+n]) for i in range(len(seq)-n+1)]
def main():
X, Y = set(n_gram("paraparaparadise", 2)), set(n_gram("paragraph", 2))
print("X: ", X)
print("Y: ", Y)
print("Union: ", X | Y)
print("Intersection: ", X & Y)
print("Difference set: ", X - Y)
print("Does X contain se?: ", "se" in X)
print("Does Y include se?: ", "se" in Y)
if __name__ == '__main__':
main()
Implement a function that takes arguments x, y, z and returns the string "y at x is z". Furthermore, set x = 12, y = "temperature", z = 22.4, and check the execution result.
07.py
# coding:utf-8
def ans(x, y, z):
return '{}of time{}Is{}'.format(x, y, z)
def main():
x, y, z = 12, 'temperature', 22.4
print(ans(x, y, z))
if __name__ == '__main__':
main()
This wasn't that difficult. It was a problem if I knew the format.
Implement the function cipher that converts each character of the given character string with the following specifications. ・ If lowercase letters, replace with (219 --character code) characters ・ Other characters are output as they are Use this function to encrypt / decrypt English messages.
08.py
# coding:utf-8
def cipher(seq):
return ''.join(chr(219-ord(i)) if i.islower() else i for i in seq)
def main():
seq = 'Is the order a rabbit?'
print("encryption: " + str(cipher(seq)))
print("Decryption: " + str(cipher(cipher(seq))))
if __name__ == '__main__':
main()
encryption: Ih gsv liwvi z izyyrg?
Decryption: Is the order a rabbit?
I used islower () to determine if all case-sensitive characters are lowercase, and used ord to make characters → ascii and chr to make ascii → characters. I couldn't think of a particularly good English sentence, so I decided to treat it appropriately. ~~ I like the people inside Cocoa ~~
Create a program that randomly rearranges the order of the other letters, leaving the first and last letters of each word for the word string separated by spaces. However, words with a length of 4 or less are not rearranged. Give an appropriate English sentence (for example, "I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind.") And check the execution result.
09.py
import random
def Typoglycemia(seq):
return " ".join((x[0] + "".join(random.sample(x[1:-1], len(x[1:-1]))) + x[-1]) if len(x) > 4 else x for i,x in enumerate(seq.split()))
def main():
s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind."
print(Typoglycemia(s))
if __name__ == '__main__':
main()
I coludn't bevelie that I cuold aultalcy urnteasndd what I was riednag : the pmenneaohl peowr of the hamun mdni.
Characters other than the beginning and end of the character string were randomly rearranged with random.sample without duplication.
(After editing ↓) Typoglycemia was rewritten.
09.py
import random
def Typoglycemia(seq):
shuffle = lambda x: "".join(random.sample(x, len(x)))
typo = lambda x: x[0] + shuffle(x[1:-1]) + x[-1]
return " ".join(typo(x) if len(x) > 4 else x for x in seq.split())
def main():
s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind."
print(Typoglycemia(s))
if __name__ == '__main__':
main()
lambda expression is convenient. I will use it when it seems to be usable in the future.
I will write the continuation when I feel like it.
Recommended Posts