Um mein Verständnis des maschinellen Lernens zu vertiefen, habe ich kürzlich begonnen, an kollektiver Intelligenz zu arbeiten (japanischer Titel ist kollektive Intelligenzprogrammierung). In Kapitel 3, Hierarchisches Clustering, habe ich cluster.py erstellt und eine Funktion erstellt. Als ich es ausführte, war ich süchtig nach der Elementspezifikation, weil es nicht funktionierte.
Collective Intelligence weist viele Druckfehler und wenige offizielle Korrekturen auf, daher inoffizielle Korrekturliste Wurde erstellt, aber es wurde dort auch nicht aufgeführt, daher denke ich, dass mein Code wahrscheinlich falsch ist. Wenn Sie Fehler finden, lassen Sie es mich bitte wissen.
Zuerst habe ich den folgenden Code geschrieben, als ich den Datensatz vorbereitet und cluster.py vorbereitet habe.
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Dann habe ich diese Datei importiert und wie folgt auf dem Interpreter ausgeführt:
blognames, words, data=clusters.readfile('blogdata.txt')
four looking second here music until example want wrong easier series re wasn service project person episode best country asked much life things big couple had easy possible right old people support later time leave love working awesome such data so years didn internet million quite open future san say saw note take ways going where many wants photos single technology being around traffic world power favorite other image her am number tv th large small past hours via company learn states information its always found week really major also play plan set see movie last whole recent d continue anything into link line posted us ago having try video let great makes tools next process high move doing could start system fact should hope means stuff edition email less web government five become does chance told work interview after order office then them they network another do away com voice hand photo night security marketing post months way update together p guy change history live car write product remember still now january year space shows friend than online only between article comes these media real read early using business aren lot trying building since month very family put ve site help actually event reason ask american off clear pretty during x close won probably else look while user game some doesn youtube go facebook click products started control links software front times exactly need able based course she state key problem both well page twitter home he friends amp companies likely even ever never call tell give before better went side content isn features matter don m points stop bad said against three if make left human yes yet deal popular down digital me did run box making may man maybe talk nbsp interesting thing think first long little anyone were especially show black get nearly morning behind reading across among those different same running money either users enough videos film again important u public search two share coming through late someone everyone house hard idea done least part tool most find please point simple itself bit google often back others bunch ll day text including taking value almost thought latest add like works buy minutes special under every would phone must my keep end over writing each group got free days already top too took talking though watch amazon report full however news quickly several social everything why head check no when cool posts says goes sports today local name turn place given released any ideas sure written come case good without seems blog there program far list design version short might used friday feel story store king kind nothing windows his him art political questions fast called once issues apple app use few something united six instead looks our york their which who ones view available stories gets know press because lead getting own made book
Schneier on Security 1 0 1 2 0 2 1 2 2 1 0 5 0 1 1 0 0 2 2 0 4 0 2 1 2 2 0 1 2 1 4 1 2 6 0 0 0 0 3 2 3 1 0 6 0 0 0 3 0 1 4 0 1 1 5 4 3 0 0 0 2 3 3 0 2 1 0 6 0 0 0 2 0 0 0 1 0 0 0 1 1 2 1 9 0 0 0 0 2 3 0 1 1 3 1 1 0 1 0 0 1 2 0 0 0 15 1 1 1 0 2 0 1 1 0 3 1 1 1 9 0 1 1 9 0 1 0 0 0 0 0 12 0 2 2 0 0 5 0 0 1 1 0 5 20 2 1 5 3 1 0 3 0 1 7 0 2 2 1 0 0 0 0 1 1 1 0 0 0 0 1 2 0 4 0 0 0 4 0 7 4 2 0 6 0 1 0 0 4 0 0 2 1 1 2 0 5 0 0 0 0 1 1 0 1 0 1 3 0 1 1 0 0 0 0 2 0 1 1 0 1 2 0 0 0 0 1 1 1 0 0 0 2 0 4 1 2 0 0 2 0 4 0 5 0 0 0 5 0 0 0 1 6 0 2 2 3 1 2 2 0 0 0 1 0 2 5 0 1 0 0 3 7 1 5 1 0 2 0 0 1 0 4 0 0 9 1 0 3 3 0 1 1 0 1 3 1 3 2 0 0 8 0 1 1 4 2 0 1 0 1 1 3 4 9 0 0 5 0 1 1 0 0 1 0 2 0 4 0 2 1 2 0 1 0 2 0 0 1 1 0 5 0 0 0 0 2 0 0 2 1 1 0 0 0 1 2 1 0 0 0 0 0 3 0 0 0 0 2 1 3 1 0 0 0 0 3 0 1 2 1 0 1 2 0 0 0 0 2 0 0 0 7 1 5 1 4 0 1 5 0 0 2 14 0 0 1 0 0 0 0 0 0 0 0 0 2 0 2 2 1 1 0 2 1 1 4 2 0 0 0 0 0 5 4 1 0 0 2 0 1 0 1 1 0 1 0 0 0 2 1 0 0 0 2 1 1 1 0 0 0 3 0 11 5 13 1 1 3 2 0 7 1 7 0 0 2 0 0
PaulStamatiou.com - Technology, Design and Photography 2 21 13 69 15 38 53 120 5 23 6 115 19 21 5 15 2 47 2 12 141 26 60 29 0 100 34 11 74 29 71 21 34 159 11 31 50 2 36 52 210 28 39 7 3 26 31 17 10 22 2 18 69 12 54 91 66 11 131 13 4 50 76 9 17 18 6 95 105 3 20 13 12 …
Ich habe verstanden, dass die Situation, die diesmal auftrat, darin bestand, dass ich beim Versuch, die in der Datei enthaltenen numerischen Daten in float zu konvertieren, versucht habe, den "schauenden" String in float zu konvertieren, und ich war wütend, dass ich dies nicht tun konnte. Ich bin.
Sie müssen also das erste Element in der for-Anweisung überspringen und in float konvertieren (es ist schrecklicher Code, weil Sie neu in Python sind ...) und Sie müssen nicht schreiben: Ich denke. (Tatsächlich hat das funktioniert.)
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
first_line=lines[1]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
if line==first_line: continue
else: data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Selbst wenn ich einige Informationen überprüfe, scheint der ursprüngliche Code einwandfrei zu funktionieren. Daher halte ich es für sehr wahrscheinlich, dass mein Code falsch ist. Wenn Sie etwas bemerken, weisen Sie bitte darauf hin. Oder ich hoffe, dieser Artikel hilft jemandem.
Recommended Posts