[PYTHON] Jupyter Notebook: 4 banal tips and tricks

Today, December 23, is "The Emperor's Birthday" to celebrate the birthday of Emperor Akihito. By the way, it is said that it is the 2676th year of the imperial era (prime number? No, $ 2 ^ 2 \ times 3 \ times 223 $). Suddenly after noon, under the elevated Namba, Nankai Electric Railway's def vibes and IPA are stuck in the left hand, and it is a common hand-held Jupyter Notebook. I'll try to log him around.

Common guy

--Equation editor --Graph drawing

Image processing --Natural language processing

Equation editor

For the past year or so, I've been doing group theory about masters in Alafor's practice. I have accumulated a lot of handwritten notes, so I want to keep it in the repository in a clean copy. As for the part, set the cell mode to Markdown and as expected,

$ LaTeX notation $

If so, MathJax will do the rest.

During this time, well-defined was an example of a problem, so "addition of fractions is to add the numerator and denominator, respectively. If you put it out like this, you will have to make a note because the moment was slightly grinning and the well-defined of the addition of fractions was shown by instant killing (only the main points are excerpted).

$\begin{eqnarray*}
    \dfrac{a}{b} = \dfrac{a^{'}}{b^{'}} \nonumber \\
    \dfrac{c}{d} = \dfrac{c^{'}}{d^{'}} \nonumber
\end{eqnarray*}$

Then,

$\begin{eqnarray*}
    {a}{b^{'}} - {a^{'}}{b} = 0 \nonumber \\
    {c}{d^{'}} - {c^{'}}{d} = 0 \nonumber 
\end{eqnarray*}$

$\begin{eqnarray*}
    \dfrac{ad+bc}{bd} = \dfrac{a^{'}d^{'}+b^{'}c^{'}}{b^{'}d^{'}} \nonumber
\end{eqnarray*}$

$\begin{eqnarray*}
    (ad + bc)b^{'}d^{'} - bd(a^{'}d^{'} + b^{'}c^{'}) & = &  adb^{'}d^{'} + bcb^{'}d^{'} - bda^{'}d^{'} - bdb^{'}c^{'} \nonumber \\
                                                      & = &  dd^{'}(ab^{'} - a^{'}b) + bb^{'}(cd^{'} - c^{'}d)  \nonumber \\
                                                      & = & 0 \nonumber 
\end{eqnarray*}$

github also shows it reasonably well, but nbviewer It is better to [reproduce] the expression on hand faithfully through /) (https://nbviewer.jupyter.org/github/azukiwasher/math-lessons/blob/master/algebra/well-defined.ipynb).

Graph drawing

From this point on, I began to feel that something that drastically reduced the number of words was secreted into the body, so I tried drawing with + IPA and the Lemniscate curve.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

alpha = 1
t = np.linspace(0, 2*np.pi, num=1000)
x = [alpha * np.sqrt(2)*np.cos(i) / (np.sin(i)**2+1) for i in t]
y = [alpha * np.sqrt(2)*np.cos(i)*np.sin(i) / (np.sin(i)**2+1) for i in t]
plt.plot(x, y)

A slightly like that was completed. The explanation of here about the lemniscate curve is wonderful.

Image processing

An example of using opencv to recognize the so-called "human" face on an image. First, check what happens when you try it with a certain mascot (maybe not so loose). Pull the necessary library and mascot images. It's all done with one-stop operation on Jupyter Notebook.

import cv2
from skimage import io
import matplotlib.pyplot as plt
%matplotlib inline

url = "https://qiita-image-store.s3.amazonaws.com/0/151745/8f4e7214-6c1c-c782-4986-5929a33f5a1b.jpeg "
img = io.imread(url)
plt.imshow(img)

Static data (cascade filter file) in which the characteristics of human faces are learned in advance is prepared.

PATH_TO_CASCADE = "/Users/azki/anaconda3/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml"
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cascade = cv2.CascadeClassifier(PATH_TO_CASCADE)
faces = cascade.detectMultiScale(img_gray, scaleFactor=1.1, minNeighbors=1, minSize=(1, 1))
new_img = img
for x,y,w,h in faces:
  cv2.rectangle(new_img, (x,y), (x+w, y+h), (0, 0, 255), thickness=2)

plt.imshow(new_img)

It seems that it is recognized as a human face in two places (blue square. Ryoma-like statue is completely through), but I decided not to see it and try another image.

img2 = cv2.imread('gymnasium.jpg', cv2.IMREAD_COLOR)
plt.imshow(img2)

Apply the same filter as the Yuru-chara.

img2_gray = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
cascade = cv2.CascadeClassifier(PATH_TO_CASCADE)
faces = cascade.detectMultiScale(img2_gray, scaleFactor=1.1, minNeighbors=1, minSize=(1, 1))
new_img2 = img2
for x,y,w,h in faces:
  cv2.rectangle(new_img2, (x,y), (x+w, y+h), (0, 0, 255), 2)

plt.imshow(new_img2)

It seems that some accidental explosions have occurred, but the result is not bad. Since we were able to slice a person's face in this way, we should be able to apply a mosaic from it and analyze emotions and situations mechanically and on a large scale.

Natural language processing

When Yoshiko sees off her husband's office every morning, it's always past ten o'clock, but she finally becomes her own body....

Starting from a passage,

I have omitted the word "state" in the manuscript, but I would like to give it the title "Ningen Isu".
Then, don't be rude, just ask. I'm sorry.

Then, Ranpo Edogawa's "Human Chair".

Hiding in a chair he designed, he sneaked into someone else's house instead of being rude, sharpened his senses as if it had become a kind of sensor, integrated with the largest messaging protocol of the time, the post office, and called a letter that was not a log. Send asynchronous and extremely one-sided "feelings". The curious and deadlocked recipient was terrified by the improbable reality, and asked whether he consulted with his husband, who had returned home from work, asking, "Well, how should I dispose of this disgusting chair?"

I don't want to know how this "human chair" was finally disposed of, and I don't want to know it in the first place, but Yumeno Kyusaku's "[ Bottling Hell ”, this work, which is a light novel of the Taisho era, is used as data consisting of a list of words [word2vec] ](Https://code.google.com/archive/p/word2vec/) Let's analyze it.

Text that has been cleansed in advance for noise such as ruby and header / footer is morphologically analyzed and tokenized with MeCab. Using a list of these tokens as input, the vocabulary that makes up the "human chair" is expressed in vector space with word2vec. The closer the distance (cos value) in space, the more semantically similar it is considered. In addition, it is possible to perform operations (adjustment) between words. Reference.

Anyway, I got the text from Aozora Bunko.

!curl -O http://www.aozora.gr.jp/cards/001779/files/56648_ruby_58198.zip
!unzip 56648_ruby_58198.zip

file = codecs.open('ningen-isu.txt', 'w', 'utf-8')
for line in codecs.open('ningen_isu.txt', 'r', 'shift_jis'):
  file.write(line)
file.close

#Now cleanse the text with a suitable editor.

Morphological analysis. Choose a word whose part of speech is "noun".

import MeCab

tagger = MeCab.Tagger ('-F"%f[6] " -U"%m " -E"\n" -b 50000 -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
tagger.parseToNode('')

text = open('ningen-isu.txt')
tokens = []
for line in text:
  node = tagger.parseToNode(line)
  while node:
    word = node.surface
    pos = node.feature.split(',')[0]
    if 'noun' in pos:
      tokens.append(word)
    node = node.next
        
with open('ningen-isu-wakati.txt', 'w') as file:
  file.write(" ".join(tokens))

text.close
file.close

Express words in vector space. Also, until you visualize a part of it.

#Import the required libraries
import sys
import codecs
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager
from matplotlib.font_manager import FontProperties
from sklearn.manifold import TSNE
from gensim.models import word2vec
%matplotlib inline

#Word vector generation
#Generated"ningen-isu-w2v"Delete the first line of the file line by line.
data = word2vec.LineSentence('ningen-isu-wakati.txt')
model = word2vec.Word2Vec(data, size=200, min_count=1)
model.save_word2vec_format("ningen-isu-w2v",  binary=False)

#Japanese font setup for visualization
font_path = '/Library/Fonts/Osaka.ttf'
font_prop = matplotlib.font_manager.FontProperties(fname=font_path)
matplotlib.rcParams['font.family'] = font_prop.get_name()
matplotlib.rcParams['font.size'] = 10.0

#A function that processes a word2vec format file into visualization data.
def load_embeddings(file_name):
  with codecs.open(file_name, 'r', 'utf-8') as f_in:
    vocabulary, wv = zip(*[line.strip().split(' ', 1) for line in f_in])
    wv = np.loadtxt(wv, delimiter=' ', dtype=float)
  return wv, vocabulary

#Graph generation
embeddings_file = "ningen-isu-w2v"
wv, vocabulary = load_embeddings(embeddings_file)
tsne = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y = tsne.fit_transform(wv[:200,:]) #Limited to some words

plt.figure(figsize=(20,20))
plt.scatter(Y[:, 0], Y[:, 1])
for label, x, y in zip(vocabulary, Y[:, 0], Y[:, 1]):
  plt.annotate(label, xy=(x, y), xytext=(0, 0), textcoords='offset points')

#plt.show()
plt.savefig('ningen-isu.png', bbox_inches='tight')
plt.close()

You can see that words such as "I", "chair", and "feel" are close to each other, yeah.

Let's sample some distances (cos values) of specific words.

HM. So what happens when you pull a "chair" from "I"?

"subtle".

In these days when it is noisy with IoT and Blockchain, what kind of "human chair" would have been newly launched if Ranpo was alive, and subjective impressions that have little to do with the analysis results. That's all for the conclusion.

end.

POSTSCRIPT

#Graph generation
embeddings_file = "ningen-isu-w2v"
wv, vocabulary = load_embeddings(embeddings_file)
tsne = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y = tsne.fit_transform(wv[:200,:]) #Limited to some words

In the above function, Y = tsne.fit_transform (wv [: 200,:]), this Issue was reproduced in the local environment. https://github.com/scikit-learn/scikit-learn/issues/6665 As you can see in the issue comment, I was able to fix it with pip install --pre scikit-learn -U even in my environment.