[PYTHON] [Empfohlenes Tagging für maschinelles Lernen # 4] Skript für maschinelles Lernen ...?

Hi, I hope you're doing well. I'm so sleepy... because gym activities in morning time. But I'd like to resume my process ... with drink :stuck_out_tongue_closed_eyes: yahoo!

So today's topic is finally ... machine learning! we already got necessary elements for learning ant test, so only what I have to do, train my machine! Start ... but I have to say one thing before starting.

I can't do coding of Machine Learning...!

Really sorry, oh, stop!! don't through a stone in you hand ... yep, light. I don't make it actually I can't. Instead, I'd like to use script from another site. And I think you know it. Here. Beginnen wir mit maschinellem Lernen Teil 3 Implementieren wir einen Bayes'schen Filter --gihyo.jp This is very good site for learning Machine Learning as entrance. I really recommend it.

So today, that's call it for today... ? Humm. Actually I have to change some points to apply to my purpose. I'd like to show some change how I can change it. Nothing of machine learning today...

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

This is train function: Got words in doc then cat value amounts are counted up for the words. However this is only for one category by one web content. However there are two or upper category will also be tagged for one web content. So I changed the script like this.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

Use cats value as list. Not single string. using for to count up each category by words.

Next is to modifying the result showing. Original script is like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

This function returns the best category name. However I'd like to show all category and probability. So I modified like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

Previous code, just return maximum probably tag. But I'd like to know all tag's result. so return the list.

The engine of machine language is just using other person's idea... Next I'd like to show you the result of the machine learning and consideration.

Hallo, das ist Umemura.

Ich schreibe immer einen Satz, nachdem ich eine Dose Bier getrunken habe. Es ist gut, aber es ist genau richtig.

Also, heute werde ich endlich zum Hauptkreis des maschinellen Lernens gehen. Nein, es tut mir leid, dass ich Sie bisher warten ließ. Es wird endlich beginnen. Nein, nein. .. Ich muss mich für eine Sache entschuldigen.

** Wir werden dieses Mal kein maschinelles Lernen codieren! ** **.

Nein, hör auf und wirf keine Steine! ··· Korrekt. Ich werde es nicht tun. Das kann ich nicht sagen. Stattdessen verwenden wir den Beispielcode von Native Bayes von der folgenden Site für maschinelles Lernen.

Beginnen wir mit maschinellem Lernen Teil 3 Implementieren wir einen Bayes'schen Filter --gihyo.jp

Dieser Artikel und diese Serie sind sehr lehrreich. Eigentlich habe ich mit diesem Artikel auch als Ausgangspunkt maschinelles Lernen begonnen. Es ist eine sehr höfliche Struktur, dass jeder am maschinellen Lernen arbeiten kann, wenn er sich an die Kenntnisse der Mathematikwahrscheinlichkeit, Algebra und Differenzierung der High School erinnert.

Nun, der heutige Inhalt ist vorbei. .. .. Na dann! Ich bin einsam, deshalb möchte ich heute vorstellen, wie ich diesen Native Bayes-Code geändert habe. Zunächst der folgende Teil.

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

Hier ist doc ein zu lernender Satz und cat ein anzuwendendes Tag, aber das Original hat die Form, dass nur ein Tag an einen Satz angehängt werden kann. Dieses Mal können wir jedoch mehrere Tags hinzufügen. Setzen wir also Katze auf Katzen, damit wir eine Liste mit Tags erstellen können.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

Es ist einfach, die Anzahl der Erscheinungsbilder jedes Tags in der Liste entsprechend zu erhöhen.

Und als nächstes. Um das Ergebnis der Schätzungsbeurteilung anzuzeigen, gibt das ursprüngliche Skript nur das Tag mit der höchsten Wahrscheinlichkeit zurück.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

Das reicht nicht aus, also werde ich versuchen, alle Tags und ihre Wahrscheinlichkeiten (tatsächlich logarithmisch) zurückzugeben. In absteigender Reihenfolge der Wahrscheinlichkeit sortiert.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

Übrigens möchte ich heute ein Skript für maschinelles Lernen vorstellen. Beim nächsten Mal möchte ich dieses Skript verwenden, um die Ergebnisse zu trainieren und anzuzeigen. Und ich möchte verschiedene Ergebnisse berücksichtigen.

Wir sehen uns wieder!