[PYTHON] [Recommended tagging for machine learning # 4] Machine learning script ...?

<ENGLISH>

Hi, I hope you're doing well. I'm so sleepy... because gym activities in morning time. But I'd like to resume my process ... with drink :stuck_out_tongue_closed_eyes: yahoo!

So today's topic is finally ... machine learning! we already got necessary elements for learning ant test, so only what I have to do, train my machine! Start ... but I have to say one thing before starting.

I can't do coding of Machine Learning...!

Really sorry, oh, stop!! don't through a stone in you hand ... yep, light. I don't make it actually I can't. Instead, I'd like to use script from another site. And I think you know it. Here. Let's get started with machine learning Part 3 Let's implement Bayesian filter --gihyo.jp This is very good site for learning Machine Learning as entrance. I really recommend it.

So today, that's call it for today... ? Humm. Actually I have to change some points to apply to my purpose. I'd like to show some change how I can change it. Nothing of machine learning today...

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

This is train function: Got words in doc then cat value amounts are counted up for the words. However this is only for one category by one web content. However there are two or upper category will also be tagged for one web content. So I changed the script like this.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

Use cats value as list. Not single string. using for to count up each category by words.

Next is to modifying the result showing. Original script is like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

This function returns the best category name. However I'd like to show all category and probability. So I modified like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

Previous code, just return maximum probably tag. But I'd like to know all tag's result. so return the list.

The engine of machine language is just using other person's idea... Next I'd like to show you the result of the machine learning and consideration.

Hi, this is Umemura.

I always write a sentence after I have a can of beer. It's good, but it's just right.

So, today I will finally go to the main body of machine learning. No, I'm sorry to have kept you waiting so far. It will finally start. No, no. .. I have one thing to apologize for.

** We will not code machine learning this time! ** **

No, stop and don't throw stones! ··· That's right. I won't do it. I can't say that. Instead, we will use the sample code of Native Bayes from the following site for machine learning.

Let's get started with machine learning Part 3 Let's implement Bayesian filter --gihyo.jp

This article and series is very educational. Actually, I also started machine learning with this article as a starting point. It is a very polite structure that anyone can work on machine learning once they remember the knowledge of high school mathematics probability, algebra calculation, and differentiation.

Well, today's content is over. .. .. Well then! I'm lonely, so today I would like to introduce how I modified this Native Bayesian code. First of all, the following part.

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

Here, doc is a sentence to be learned and cat is a tag to be applied, but the original is in a form that only one tag can be attached to one sentence. However, this time we can add multiple tags, so let's set cat to cats so that we can put a list of tags.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

It is easy to increase the appearance count of each tag in the list accordingly.

And next. As for how to show the estimation judgment result, the original script just returns the tag with the highest probability.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

That's not enough to consider, so I'll try to return all the tags and their probabilities (actually logarithmic). Sorted in descending order of probability.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

By the way, today I introduced a machine learning script. Next time, I'd like to use this script to learn and show the results. And I would like to consider various results.

See you again!

Recommended Posts

[Recommended tagging for machine learning # 4] Machine learning script ...?
[Recommended tagging for machine learning # 2] Extension of scraping script
[Recommended tagging for machine learning # 2.5] Modification of scraping script
[Recommended tagging for machine learning # 1] Scraping of Hatena blog articles
Recommended study order for machine learning / deep learning beginners
Japanese preprocessing for machine learning
Image collection Python script for creating datasets for machine learning
Machine learning
<For beginners> python library <For machine learning>
Machine learning meeting information for HRTech
Summary of recommended APIs for artificial intelligence, machine learning, and AI
Amplify images for machine learning with python
First Steps for Machine Learning (AI) Beginners
An introduction to OpenCV for machine learning
2020 Recommended 20 selections of introductory machine learning books
[Shakyo] Encounter with Python for machine learning
[Python] Web application design for machine learning
An introduction to Python for machine learning
Creating a development environment for machine learning
Recommended for get_or_new
[Memo] Machine learning
Machine Learning sample
An introduction to machine learning for bot developers
Beginning of machine learning (recommended teaching materials / information)
Machine learning starting from 0 for theoretical physics students # 1
Upgrade the Azure Machine Learning SDK for Python
[Python] Collect images with Icrawler for machine learning [1000 images]
Machine learning starting from 0 for theoretical physics students # 2
Collect images for machine learning (Bing Search API)
[For beginners] Introduction to vectorization in machine learning
Machine learning tutorial summary
Build an interactive environment for machine learning in Python
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2
Reinforcement learning for tic-tac-toe
Machine learning support vector machine
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Preparing to start "Python machine learning programming" (for macOS)
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
[Python] I made a classifier for irises [Machine learning]
Somehow learn machine learning
Study method for learning machine learning from scratch (March 2020 version)
Summary for learning RAPIDS
14 e-mail newsletters useful for gathering information on machine learning
Memo for building a machine learning environment using Python
xgboost: A valid machine learning model for table data
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Everything for beginners to be able to do machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Rebuilding an environment for machine learning with Miniconda (Windows version)
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Made icrawler easier to use for machine learning data collection
I tried using Tensorboard, a visualization tool for machine learning