Récemment, afin d'approfondir ma compréhension de l'apprentissage automatique, j'ai commencé à travailler sur l'intelligence collective (le titre japonais est la programmation d'intelligence collective). Dans le chapitre 3, Clustering hiérarchique, j'ai créé clusters.py et créé une fonction. Quand je l'ai exécuté, j'étais accro à la spécification d'élément parce que cela ne fonctionnait pas.
L'intelligence collective comporte de nombreuses erreurs d'impression et quelques correctifs officiels, donc liste de correctifs non officiels A été créé, mais il n'y figurait pas non plus, donc je pense que mon code est probablement erroné. Si vous trouvez des erreurs, faites-le moi savoir.
Tout d'abord, j'ai écrit le code suivant lors de la préparation de l'ensemble de données et préparé clusters.py.
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Ensuite, j'ai importé ce fichier et l'ai exécuté sur l'interpréteur comme suit:
blognames, words, data=clusters.readfile('blogdata.txt')
four looking second here music until example want wrong easier series re wasn service project person episode best country asked much life things big couple had easy possible right old people support later time leave love working awesome such data so years didn internet million quite open future san say saw note take ways going where many wants photos single technology being around traffic world power favorite other image her am number tv th large small past hours via company learn states information its always found week really major also play plan set see movie last whole recent d continue anything into link line posted us ago having try video let great makes tools next process high move doing could start system fact should hope means stuff edition email less web government five become does chance told work interview after order office then them they network another do away com voice hand photo night security marketing post months way update together p guy change history live car write product remember still now january year space shows friend than online only between article comes these media real read early using business aren lot trying building since month very family put ve site help actually event reason ask american off clear pretty during x close won probably else look while user game some doesn youtube go facebook click products started control links software front times exactly need able based course she state key problem both well page twitter home he friends amp companies likely even ever never call tell give before better went side content isn features matter don m points stop bad said against three if make left human yes yet deal popular down digital me did run box making may man maybe talk nbsp interesting thing think first long little anyone were especially show black get nearly morning behind reading across among those different same running money either users enough videos film again important u public search two share coming through late someone everyone house hard idea done least part tool most find please point simple itself bit google often back others bunch ll day text including taking value almost thought latest add like works buy minutes special under every would phone must my keep end over writing each group got free days already top too took talking though watch amazon report full however news quickly several social everything why head check no when cool posts says goes sports today local name turn place given released any ideas sure written come case good without seems blog there program far list design version short might used friday feel story store king kind nothing windows his him art political questions fast called once issues apple app use few something united six instead looks our york their which who ones view available stories gets know press because lead getting own made book
Schneier on Security 1 0 1 2 0 2 1 2 2 1 0 5 0 1 1 0 0 2 2 0 4 0 2 1 2 2 0 1 2 1 4 1 2 6 0 0 0 0 3 2 3 1 0 6 0 0 0 3 0 1 4 0 1 1 5 4 3 0 0 0 2 3 3 0 2 1 0 6 0 0 0 2 0 0 0 1 0 0 0 1 1 2 1 9 0 0 0 0 2 3 0 1 1 3 1 1 0 1 0 0 1 2 0 0 0 15 1 1 1 0 2 0 1 1 0 3 1 1 1 9 0 1 1 9 0 1 0 0 0 0 0 12 0 2 2 0 0 5 0 0 1 1 0 5 20 2 1 5 3 1 0 3 0 1 7 0 2 2 1 0 0 0 0 1 1 1 0 0 0 0 1 2 0 4 0 0 0 4 0 7 4 2 0 6 0 1 0 0 4 0 0 2 1 1 2 0 5 0 0 0 0 1 1 0 1 0 1 3 0 1 1 0 0 0 0 2 0 1 1 0 1 2 0 0 0 0 1 1 1 0 0 0 2 0 4 1 2 0 0 2 0 4 0 5 0 0 0 5 0 0 0 1 6 0 2 2 3 1 2 2 0 0 0 1 0 2 5 0 1 0 0 3 7 1 5 1 0 2 0 0 1 0 4 0 0 9 1 0 3 3 0 1 1 0 1 3 1 3 2 0 0 8 0 1 1 4 2 0 1 0 1 1 3 4 9 0 0 5 0 1 1 0 0 1 0 2 0 4 0 2 1 2 0 1 0 2 0 0 1 1 0 5 0 0 0 0 2 0 0 2 1 1 0 0 0 1 2 1 0 0 0 0 0 3 0 0 0 0 2 1 3 1 0 0 0 0 3 0 1 2 1 0 1 2 0 0 0 0 2 0 0 0 7 1 5 1 4 0 1 5 0 0 2 14 0 0 1 0 0 0 0 0 0 0 0 0 2 0 2 2 1 1 0 2 1 1 4 2 0 0 0 0 0 5 4 1 0 0 2 0 1 0 1 1 0 1 0 0 0 2 1 0 0 0 2 1 1 1 0 0 0 3 0 11 5 13 1 1 3 2 0 7 1 7 0 0 2 0 0
PaulStamatiou.com - Technology, Design and Photography 2 21 13 69 15 38 53 120 5 23 6 115 19 21 5 15 2 47 2 12 141 26 60 29 0 100 34 11 74 29 71 21 34 159 11 31 50 2 36 52 210 28 39 7 3 26 31 17 10 22 2 18 69 12 54 91 66 11 131 13 4 50 76 9 17 18 6 95 105 3 20 13 12 …
J'ai compris que la situation qui s'est produite cette fois-ci était que lorsque j'ai essayé de convertir les données numériques contenues dans le fichier en flottant, j'ai essayé de convertir la chaîne "regardant" en flottant, et j'étais en colère de ne pas pouvoir le faire. Je suis.
Vous devez donc ignorer le premier élément de l'instruction for et le convertir en float (c'est un code terrible car vous êtes nouveau sur python ...) et vous n'avez pas à écrire: Je pense. (En fait, cela a fonctionné.)
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
first_line=lines[1]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
if line==first_line: continue
else: data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Même si je vérifie certaines informations, il semble que le code d'origine fonctionne correctement, donc je pense qu'il est fort possible que mon code soit erroné. Si vous remarquez quelque chose, veuillez le signaler. Ou, j'espère que cet article aide quelqu'un.
Recommended Posts