Recently, in order to deepen my understanding of machine learning, I started working on Collective Intelligence (the Japanese title is collective intelligence programming). In Chapter 3, Hierarchical Clustering, I created clusters.py and created a function. When I executed it, I was addicted to the element specification because it didn't work.
Collective Intelligence has many typographical errors and few official corrections, so unofficial correction list Was created, but it wasn't listed there either, so I think my code is probably wrong. If you find any mistakes, please let me know.
First, I wrote the following code when preparing the dataset and prepared clusters.py.
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Then I imported the file and ran it on the interpreter as follows:
blognames, words, data=clusters.readfile('blogdata.txt')
four looking second here music until example want wrong easier series re wasn service project person episode best country asked much life things big couple had easy possible right old people support later time leave love working awesome such data so years didn internet million quite open future san say saw note take ways going where many wants photos single technology being around traffic world power favorite other image her am number tv th large small past hours via company learn states information its always found week really major also play plan set see movie last whole recent d continue anything into link line posted us ago having try video let great makes tools next process high move doing could start system fact should hope means stuff edition email less web government five become does chance told work interview after order office then them they network another do away com voice hand photo night security marketing post months way update together p guy change history live car write product remember still now january year space shows friend than online only between article comes these media real read early using business aren lot trying building since month very family put ve site help actually event reason ask american off clear pretty during x close won probably else look while user game some doesn youtube go facebook click products started control links software front times exactly need able based course she state key problem both well page twitter home he friends amp companies likely even ever never call tell give before better went side content isn features matter don m points stop bad said against three if make left human yes yet deal popular down digital me did run box making may man maybe talk nbsp interesting thing think first long little anyone were especially show black get nearly morning behind reading across among those different same running money either users enough videos film again important u public search two share coming through late someone everyone house hard idea done least part tool most find please point simple itself bit google often back others bunch ll day text including taking value almost thought latest add like works buy minutes special under every would phone must my keep end over writing each group got free days already top too took talking though watch amazon report full however news quickly several social everything why head check no when cool posts says goes sports today local name turn place given released any ideas sure written come case good without seems blog there program far list design version short might used friday feel story store king kind nothing windows his him art political questions fast called once issues apple app use few something united six instead looks our york their which who ones view available stories gets know press because lead getting own made book
Schneier on Security 1 0 1 2 0 2 1 2 2 1 0 5 0 1 1 0 0 2 2 0 4 0 2 1 2 2 0 1 2 1 4 1 2 6 0 0 0 0 3 2 3 1 0 6 0 0 0 3 0 1 4 0 1 1 5 4 3 0 0 0 2 3 3 0 2 1 0 6 0 0 0 2 0 0 0 1 0 0 0 1 1 2 1 9 0 0 0 0 2 3 0 1 1 3 1 1 0 1 0 0 1 2 0 0 0 15 1 1 1 0 2 0 1 1 0 3 1 1 1 9 0 1 1 9 0 1 0 0 0 0 0 12 0 2 2 0 0 5 0 0 1 1 0 5 20 2 1 5 3 1 0 3 0 1 7 0 2 2 1 0 0 0 0 1 1 1 0 0 0 0 1 2 0 4 0 0 0 4 0 7 4 2 0 6 0 1 0 0 4 0 0 2 1 1 2 0 5 0 0 0 0 1 1 0 1 0 1 3 0 1 1 0 0 0 0 2 0 1 1 0 1 2 0 0 0 0 1 1 1 0 0 0 2 0 4 1 2 0 0 2 0 4 0 5 0 0 0 5 0 0 0 1 6 0 2 2 3 1 2 2 0 0 0 1 0 2 5 0 1 0 0 3 7 1 5 1 0 2 0 0 1 0 4 0 0 9 1 0 3 3 0 1 1 0 1 3 1 3 2 0 0 8 0 1 1 4 2 0 1 0 1 1 3 4 9 0 0 5 0 1 1 0 0 1 0 2 0 4 0 2 1 2 0 1 0 2 0 0 1 1 0 5 0 0 0 0 2 0 0 2 1 1 0 0 0 1 2 1 0 0 0 0 0 3 0 0 0 0 2 1 3 1 0 0 0 0 3 0 1 2 1 0 1 2 0 0 0 0 2 0 0 0 7 1 5 1 4 0 1 5 0 0 2 14 0 0 1 0 0 0 0 0 0 0 0 0 2 0 2 2 1 1 0 2 1 1 4 2 0 0 0 0 0 5 4 1 0 0 2 0 1 0 1 1 0 1 0 0 0 2 1 0 0 0 2 1 1 1 0 0 0 3 0 11 5 13 1 1 3 2 0 7 1 7 0 0 2 0 0
PaulStamatiou.com - Technology, Design and Photography 2 21 13 69 15 38 53 120 5 23 6 115 19 21 5 15 2 47 2 12 141 26 60 29 0 100 34 11 74 29 71 21 34 159 11 31 50 2 36 52 210 28 39 7 3 26 31 17 10 22 2 18 69 12 54 91 66 11 131 13 4 50 76 9 17 18 6 95 105 3 20 13 12 …
I understood that the situation that occurred this time was that when I tried to convert the numerical data contained in the file to float, I tried to convert the String "looking" to float, and I was angry that I could not do it. I am.
So you have to skip the first element in the for statement and convert to float (it's terrible code because you're new to python ...) and you don't have to write: I'm thinking. (In fact, this worked.)
clusters.py
def readfile(filename):
lines=[line for line in file(filename)]
# First line is the column titles
colnames=lines[0].strip().split('\t')[1:]
rownames=[]
data=[]
first_line=lines[1]
for line in lines[1:]:
p=line.strip().split('\t')
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
if line==first_line: continue
else: data.append([float(x) for x in p[1:]])
return rownames,colnames,data
Even if I check some information, it seems that the original code is working fine, so I think it is highly possible that my code is wrong. If you notice anything, please point it out. Or, I hope this article helps someone.
Recommended Posts