[PYTHON] Find out who is a favorite user of Hatena Bookmark and is not already active

Do you use Hatena bookmarks? If it's a hard Hatena bookmarker, I think most of them check the latest information from their favorite pages, but if you follow nearly 500 people, you will not be able to maintain which users are no longer active.

I think it's usually a good idea to use scraping to find out the date of the last bookmark, but recent Hatena bookmarks are constructed with HTML using JavaScript, so there's no point in fetching the HTML as it is. I was wondering what happened ... and when I was looking at the communication history on the Google Chrome Network, I found a mysterious URL. I will try it with my bookmark.

http://b.hatena.ne.jp/nisemono_san/fragment

It is a mystery what purpose this HTML is used for, but since it is HTML that was spit out without using JavaScript for the time being, you can use this to know the bookmarked date on the last day! Hyahoi! So, the written script is below. Beautiful Soup required.

python


# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import sys
import urllib


def _init():
    if len(sys.argv) == 1:
        print "usage: lastbookmark.py user_name"
        sys.exit(1)


def get_userlist():
    follow_users = []
    for page in range(3):
        html = urllib.urlopen(
            "http://b.hatena.ne.jp/%s/follow?of=%d" % (
                sys.argv[1], page * 200)).read()
        soup = BeautifulSoup(html)
        soup_userlist = soup.find_all('a', {'class': 'username'})
        follow_users += [userlist.text for userlist in soup_userlist]
    return follow_users


def get_last_bookmark(user):
    html = urllib.urlopen(
        "http://b.hatena.ne.jp/%s/fragment" % user).read()
    soup = BeautifulSoup(html)
    time = soup.find('span', {'class': 'timestamp'})
    print user, time.text
    return (user, time.text)


def target_user(analize_list):
    print "-----------------------------------"
    print "--- You should remove user list ---"
    print "-----------------------------------"
    for user, date in analize_list:
        date = date.split('/')
        if date[0] != "2013":
            print user, "/".join(date)


def command():
    _init()
    users = get_userlist()
    analize_list = []
    for user in users:
        analize_list.append((get_last_bookmark(user)))
    target_user(analize_list)

if __name__ == "__main__":
    command()

As expected, users who haven't been working for a year can remove it, so this is the implementation. If you have a lot of users in your favorites and you can't maintain them, try this script and you may find out who aren't there anymore.

However, I have to increase my favorite users again ...

Recommended Posts

Find out who is a favorite user of Hatena Bookmark and is not already active
A story of trying out pyenv, virtualenv and virtualenvwrapper