Entry where Python beginners do their best to knock 100 language processing little by little

100 language processing knock 2015

http://www.cl.ecei.tohoku.ac.jp/nlp100/

Beginners will do their best with Python (3.x). I think there are many similar articles, but as a personal memorandum. If you have any advice or suggestions, please leave a comment!

The source code is also posted on github. https://github.com/hbkr/nlp100

Chapter1

00. Reverse order of strings


000.py


s = "stressed"
print(s[::-1])
desserts

s [i: j: k] means slice of s from i to j with step k, so s [:: -1] goes back -1 character from the end to the beginning.

01. "Patatokukashi"


001.py


s = "Patatoku Kashii"
print(s[::2])
Police car

As explained above, you can use s [:: 2] to extract a character string by skipping one character from the beginning to the end.

02. "Police car" + "Taxi" = "Patatokukashi"


002.py


s = "".join(i+j for i, j in zip("Police car", "taxi"))
print(s)
Patatoku Kashii

You can loop multiple sequence objects at the same time with zip. sep.join (seq) concatenates seq with sep as the delimiter to make one string. The list comprehension is join with an empty string.

03. Pi


003.py


s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
count = [len(i.strip(",.")) for i in s.split()]
print(count)
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

Use str.split (sep) to divide the string into a list with sep as the delimiter. If no delimiter is specified, it will be separated by spaces, tabs, newline strings, and so on. The number of characters is counted by len () after deleting the preceding and following,. With str.strip (",.") .

04. Element symbol


004.py


s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
dic = {word[:2-(i in (1,5,6,7,8,9,15,16,19))]:i for i, word in enumerate(s.replace(".", "").split(), 1)}
print(dic)
{'He': 2, 'K': 19, 'S': 16, 'Ar': 18, 'Si': 14, 'O': 8, 'F': 9, 'P': 15, 'Na': 11, 'Cl': 17, 'B': 5, 'Ca': 20, 'Ne': 10, 'Be': 4, 'N': 7, 'C': 6, 'Mi': 12, 'Li': 3, 'H': 1, 'Al': 13}

You can get both the element index and the element with ʻenumerate (seq [, start = 0]). The index is passed as it is to the ʻin operator to adjust the number of characters to be acquired.

  1. n-gram

005.py


def n_gram(s, n): return {tuple(s[i:i+n]) for i in range(len(s)-n+1)}

s = "I am an NLPer"
print(n_gram(s, 2))
print(n_gram([t.strip(".,") for t in s.split()], 2))
{('m', ' '), ('n', ' '), ('e', 'r'), ('N', 'L'), (' ', 'N'), ('a', 'm'), ('a', 'n'), ('L', 'P'), ('I', ' '), (' ', 'a'), ('P', 'e')}
{('an', 'NLPer'), ('I', 'am'), ('am', 'an')}

The N-gram method is a method of indexing sentences with N characters as headwords in the order of the character strings. The n_gram (s, n) function cuts out the passed sequence object s element by n and returns it as a set type. By returning it as a set type, the elements are not duplicated.

06. Meeting


006.py


n_gram = lambda s, n: {tuple(s[i:i+n]) for i in range(len(s)-n+1)}

X = n_gram("paraparaparadise", 2)
Y = n_gram("paragraph", 2)

print("X: %s" % X)
print("Y: %s" % Y)
print("union: %s" % str(X|Y))
print("difference: %s" % str(X-Y))
print("intersection: %s" % str(X&Y))

if n_gram("se", 2) <= X: print("'se' is included in X.")
if n_gram("se", 2) <= Y: print("'se' is included in Y.")
X: {('a', 'd'), ('a', 'p'), ('i', 's'), ('s', 'e'), ('a', 'r'), ('p', 'a'), ('d', 'i'), ('r', 'a')}
Y: {('g', 'r'), ('p', 'h'), ('a', 'p'), ('a', 'r'), ('p', 'a'), ('r', 'a'), ('a', 'g')}
union: {('a', 'd'), ('g', 'r'), ('p', 'h'), ('a', 'p'), ('i', 's'), ('s', 'e'), ('a', 'r'), ('p', 'a'), ('d', 'i'), ('r', 'a'), ('a', 'g')}
difference: {('i', 's'), ('d', 'i'), ('a', 'd'), ('s', 'e')}
intersection: {('a', 'r'), ('p', 'a'), ('a', 'p'), ('r', 'a')}
'se' is included in X.

I will use the n_gram created in 005.py, but this time I tried using the lambda expression (since I didn't say that I should create a function this time). X | Y is the union, X-Y is the complement, and X & Y is the intersection.

07. Sentence generation by template


007.py


def f(x, y, z): return "%s time%s is%s" % (x, y, z)

print(f(12, "temperature", 22.4))
The temperature at 12:00 is 22.4

" {1} at {0} is {2} ".format (x, y, z) is fine.

08. Ciphertext


008.py


def cipher(s): return "".join(chr(219-ord(c)) if c.islower() else c for c in s)

s = "Hi He Lied Because Boron Could Not Oxidize Fluorine."
print(cipher(s))
print(cipher(cipher(s)))
Hr Hv Lrvw Bvxzfhv Blilm Clfow Nlg Ocrwrav Foflirmv.
Hi He Lied Because Boron Could Not Oxidize Fluorine.

It seems that " a "<= c <=" z " can be used instead of ʻis lower`. Is that faster?

  1. Typoglycemia

009.py


from random import random

typo = lambda s: " ".join(t[0]+"".join(sorted(t[1:-1], key=lambda k:random()))+t[-1] if len(t) > 4 else t for t in s.split())

s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
print(typo(s))
I cdnlu'ot blieeve that I culod aclualty uetdnnsard what I was rdeniag : the pnneehmoal pwoer of the huamn mind .

Somehow I got stubborn and did my best in one line. I'm using sorted () because the shuffle () function has no return value.

Recommended Posts

Entry where Python beginners do their best to knock 100 language processing little by little
100 Language Processing Knock Chapter 1 by Python
Python inexperienced person tries to knock 100 language processing 14-16
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 Language Processing with Python Knock 2015
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
[Python] Try to classify ramen shops by natural language processing
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
100 Language Processing Knock with Python (Chapter 3)
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
Python beginner tried 100 language processing knock 2015 (00 ~ 04)
Answer to AtCoder Beginners Selection by Python3
100 Language Processing Knock-89: Analogy by Additive Constitutiveness
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 image processing by Python Knock # 6 Color reduction processing
100 Language Processing Knock with Python (Chapter 2, Part 1)
How to do multi-core parallel processing with python
100 language processing knock-99 (using pandas): visualization by t-SNE
Image processing by Python 100 knock # 11 smoothing filter (average filter)
[Python] Do your best to speed up SQLAlchemy
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
[Natural language processing / NLP] How to easily perform back translation by machine translation in Python
[Language processing 100 knocks 2020] Summary of answer examples by Python
100 language processing knock-92 (using Gensim): application to analogy data
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
Compare how to write processing for lists by language
[Chapter 4] Introduction to Python with 100 knocks of language processing
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 Chapter 1
100 Amateur Language Processing Knock: 17
Python: Natural language processing
Communication processing by Python
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
Introduction to Python language
100 Amateur Language Processing Knock: 07
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 Language Processing Knock-43: Extracted clauses containing nouns related to clauses containing verbs
I will write a detailed explanation to death while solving 100 natural language processing knock 2020 with Python