4. Bayesian statistics in Python 1-1. Emotional judgment by naive Bayes [Bayes' theorem]

⑴ Basic knowledge of Bayesian statistics

1. A brief history of Bayesian statistics

2. Bayes' theorem

3. Bayesian updating

4. Stabilization of posterior probabilities by increasing data

⑵ Data set overview

(3) Data set capture

from google.colab import files
files.upload()

image.png

!tar -zxvf rt-polaritydata.tar.gz

image.png

⑷ Data preprocessing

1. Import of NLTK

import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')

2. Define a method to preprocess data

def format_sentense(sentense):
    return {word: True for word in word_tokenize(sentense) }

3. Data preprocessing

#Positive data preprocessing
pos_data = []
with open('rt-polaritydata/rt-polarity.pos', encoding='latin-1') as f:
    for line in f:
        pos_data.append([format_sentense(line), 'pos'])

#Preprocessing of negative data
neg_data = []
with open('rt-polaritydata/rt-polarity.neg', encoding='latin-1') as f:
    for line in f:
        neg_data.append([format_sentense(line), 'neg'])

4. Divide into training data and verification data

#Acquisition of training data
training_data = pos_data[:4000] + neg_data[:4000]

#Acquisition of evaluation data
testing_data = pos_data[4000:] + neg_data[4000:]

⑸ Model generation based on training data

1. Model generation

from nltk.classify import NaiveBayesClassifier

#Model generation
model = NaiveBayesClassifier.train(training_data)

2. Model trial

es1 = "This is a hilarious movie and I would watch it again and again."
es2 = "This is a boring movie and once you see it, you'll have enough."

#Output judgment result
print( es1, '--->', model.classify(format_sentense(es1)) )
print( es2, '--->', model.classify(format_sentense(es2)) )

image.png

⑹ Evaluation of model accuracy using test data

from nltk.classify.util import accuracy

print('Correct answer probability: ', accuracy(model, testing_data))

image.png

Recommended Posts

4. Bayesian statistics in Python 1-1. Emotional judgment by naive Bayes [Bayes' theorem]
Implement naive bayes in Python 3.3
Encoding judgment in Python
[Statistics for programmers] Bayes' theorem
Sort by date in python
Bayesian optimization package GPyOpt in Python
I tried to implement Bayesian linear regression by Gibbs sampling in python
I tried using Bayesian Optimization in Python
Read the file line by line in Python
Automate jobs by manipulating files in Python
Read the file line by line in Python
Common mock by moto in Python Unittest
Implemented in Python PRML Chapter 1 Bayesian Inference
Text filtering with naive bayes in sklearn
Alignment algorithm by insertion method in Python
Scene recognition by GIST features in Python