[PYTHON] Machine learning beginners try to reach out to Naive Bayes (1) --Theory

This page

I'm a beginner in machine learning and a beginner in Python, so I think it's basically like reading the pages that I referred to.

What is conditional probability (Bayes' theorem)? ??

It's a little math. Have you ever heard the word conditional probability? I heard a little about it in class before, but I could hardly remember it.

P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{P(B)P(A|B)}{P(A)}

P (B | A) refers to the probability that event B will occur if event A occurs.

A well-known issue is the story of children.

A couple has two children. It turns out that at least one of the two is a boy. At this time, find the probability that both of them are boys. However, the probability of having a boy and the probability of having a girl are both halved.

The idea of this is

If you think about it, you can think of it as follows.

Event A

At least one is a boy, so if you list all the patterns when two children are born, there are 4 of man & man, man & woman, woman & woman, woman & man. It is a pattern. (Including sibling differences) Of these, 3/4, which omits woman & woman, which are both womans, is the probability of event A.

Event B

This is, needless to say, 1/4.

That is, the probability of P (B | A) is

P(B|A) = \frac{\frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}

It will be.

Reference: Beautiful story of high school mathematics

Challenge Naive Bayes

Well, this is the main subject. Naive Bayes is also called Naive Bayes. It seems that it is used for the purpose of categorizing sentences by slightly modifying Bayes' theorem.

P(cat|doc) = \frac{P(cat)P(doc|cat)}{P(doc)}\propto P(cat)P(doc|cat)

This is the basic form.

I would like to explain each element.

P (cat | doc) is the probability of being a category (cat) given a sentence (doc). To put it a little more, it seems to mean the probability of being in a category such as "IT" when the text contains words such as "Apple", "iPhone", and "MacBook Pro".

\frac{P(cat)P(doc|cat)}{P(doc)}

Since P (doc) is common to all, we will ignore it and calculate.

P(cat)P(doc|cat)

P (cat) is easy.

If the total number of sentences is 100, and the number of sentences in the IT category is 50

P(IT) = \frac{50}{100} = \frac{1}{2}

It can be calculated like this.

P (doc | cat) may not make sense. Probability of being a sentence (doc) given a category (cat) ...? And this is the heart of Naive Bayes.

Naive Bayes does not give the exact P (doc | cat), but assumes independence between word occurrences.

Independence between the appearance of words may mean, for example, machine and learning are words that often appear in the field of machine learning, but the words machine and learning are independent. It is calculated by simplifying it by assuming that it appears.

P(doc|cat) = P(word_1 \wedge .... \wedge word_k|cat) = \prod_{i=0}^k P(word_k|cat)

To calculate.

\prod

Is the (x) version of Σ. Now let's calculate P (word | cat). This is semantically the probability that a word will appear when a category (cat) is given, so calculate as follows. (The formula is on the reference page, so please refer to it ..)

P(word_k|cat) = \frac{category(cat)Words in(word)Number of appearances}{category(cat)Total number of words that appear in}

All you have to do now is calculate using these! !!

Technique

Two keywords will appear.

I'm just doing + 1. There seems to be a zero frequency problem. What is this? I think that P (doc | cat) was expressed as the infinite product of the probability of appearance of words, but the product is 0 if there is one 0, so it is 0 as a whole. It will be 0.

For example, let's say you're learning in the category "IT" and you come across a new word "Python" that wasn't training data. Then ..

P(word_k|cat) = \frac{category(cat)Words in(word)Number of appearances}{category(cat)Total number of words that appear in}

The numerator of is 0,

P(Python|IT) = 0 

It will be. So, add +1 to the number of appearances in advance to prevent the probability from becoming 0! !! That seems to be a technique called Laplace smoothing. Naturally, the denominator also changes. If you write the formula again ..

P(word_k|cat) = \frac{category(cat)Words in(word)Number of appearances+ 1}{category(cat)Total number of words that appear in+Total number of words}

It will be.

In the sample data, it is not a big deal, but when applied to the actual calculation, the denominator of P (word | cat) may become very large.

The reason is that the number of words can be very large. Underflow may occur in such a case.

The opposite of underflow or overflow is that the decimal point becomes too large to handle.

That's where the logarithm comes in. If you take the logarithm, you can replace it with the same base addition, so even if you take the logarithm and add it, the result will be the same.

So, change the following into a logarithmic form.

\prod_{i=0}^k P(word_k|cat)
\prod_{i=0}^k \log P(word_k|cat)

Putting these together, the final formula is:

P(cat|doc) = \log P(cat) + \sum_{i=0}^k \log P(word_k|cat)

Reference: Logarithm (LOG) calculation and formula! This is perfect! !!

Machine learning beginners try Naive Bayes (2) --Implementation

So I would like to implement it using Python. (Refer to the site)

reference

I have referred to the following sites very much. Thank you very much.

Edit history

Recommended Posts

Machine learning beginners try to reach out to Naive Bayes (1) --Theory
Machine learning beginners try to make a decision tree
Machine learning beginners try linear regression
Try to forecast power demand by machine learning
[For beginners] Introduction to vectorization in machine learning
Introduction to machine learning
[Machine learning] Try to detect objects using Selective Search
Everything for beginners to be able to do machine learning
An introduction to machine learning
Machine learning beginners tried RBM
Try to evaluate the performance of machine learning / classification model
[Machine learning] I tried to summarize the theory of Adaboost
Try to predict if tweets will burn with machine learning
Try machine learning with Kaggle
Introduction to machine learning Note writing
[Machine learning] Try studying decision trees
Try machine learning with scikit-learn SVM
<For beginners> python library <For machine learning>
Introduction to Machine Learning Library SHOGUN
[Machine learning] Try studying random forest
python beginners tried to find out
How to collect machine learning data
[Python] [Machine learning] Beginners without any knowledge try machine learning for the time being
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Introduction to Machine Learning: How Models Work
scikit-learn How to use summary (machine learning)
Record the steps to understand machine learning
First Steps for Machine Learning (AI) Beginners
I installed Python 3.5.1 to study machine learning
Introduction to ClearML-Easy to manage machine learning experiments-
An introduction to Python for machine learning
Try to write code from 1 using the machine learning framework chainer (mnist edition)
Try to draw a "weather map-like front" by machine learning based on weather data (5)
Machine learning beginners tried to make a horse racing prediction model with python
Try to draw a "weather map-like front" by machine learning based on weather data (1)
Try to draw a "weather map-like front" by machine learning based on weather data (4)
Try to draw a "weather map-like front" by machine learning based on weather data (2)
How to Introduce IPython (Python2) to Mac OS X-Preparation for Introduction to Machine Learning Theory-