[PYTHON] That's why I'll analyze the comments from the House of Representatives election Nico Nama's party leader debate.

background

I think that the ban on online elections will be lifted, and the provision of election information using the Internet will become more popular, and the use of bad directions is conspicuous. How are you doing nowadays?

On November 29, 2014, a party leader debate was held with Nico students ahead of the House of Representatives election. I would like to pay tribute to the party leaders who appeared on the "biased video site" mentioned by Professor Azumi and engaged in discussions, and also investigated whether or not they were actually biased.

Source code https://github.com/mima3/analyze_election

Analysis result

[House of Representatives election 2014] Net party leader debate

http://needtec.sakura.ne.jp/analyze_election/page/nicolive/lv200730443

In conclusion, the opinion polls of newspapers are paying attention to the shadowy "next generation", there are fierce people who make hundreds of comments alone, and the leader of the Social Democratic Party, Yoshida, is more than the leader of the opposition, Kaieda. The result was that it was extracted, honestly, and also a bug.

Party leader debate in Japan National Press Club (Jiji Press Channel)

http://needtec.sakura.ne.jp/analyze_election/page/nicolive/lv201303080

As expected, the number of comments is less than Saturday night. However, there are people who comment about 200 alone. Work or school ry.

After all, the words "Abe" and "next generation" are often extracted. However, there is a presence of "Kaieda" and "Democracy" from the time when Nico Nama sponsored it.

Also, why are there so many words such as "Tsubaki" and "Asahi" in the Tsubaki case in the party leader debate? (Stick)

How to get Nico Nico Live Comments in Python

To get comments on NicoNico Live, you must first be a premium member. For regular members, the maximum number is about 1000.

After that, you can get it by following the steps below.

  1. Log in on the login page using your email address and password to get user_session
  2. Access http://watch.live.nicovideo.jp/api/getplayerstatus to get the IP and port of the message server and the user ID.
  3. Access http://watch.live.nicovideo.jp/api/getwaybackkey to get the waybackkey. You can get past comments by combining this key with your user ID.
  4. Connect to the message server to get comments. At this time, you can connect by either HTTP communication or SOCKET communication. For HTTP communication, it is necessary to connect to the port obtained by subtracting 2725 from the port number.
  5. Since you can only get 1000 items with one GET, get all the comments while setting when appropriately.

The specific code looks like this:

niconico.py


# coding: utf-8
import sys
import cookielib
import cgi
import urllib
import urllib2
from lxml import etree
import socket
import datetime
import time
import json

class NicoCtrl():
    def __init__(self, nicovideo_id, nicovideo_pw):
        self.nicovideo_id = nicovideo_id
        self.nicovideo_pw = nicovideo_pw
        #Login
        cj = cookielib.CookieJar()
        self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        req = urllib2.Request("https://secure.nicovideo.jp/secure/login")
        req.add_data( urllib.urlencode( {"mail": self.nicovideo_id, "password":self.nicovideo_pw} ))
        res = self.opener.open(req).read()
        if not 'user_session' in cj._cookies['.nicovideo.jp']['/']:
            raise Exception('PermissionError')

    def _getjson(self, url, errorcnt):
        #JSON may be cut off and come back on the way, so retry processing
        try:
            res = self.opener.open(url, timeout=100).read()
            return json.loads(res)
        except ValueError:
            if errorcnt < 3:
                errorcnt = errorcnt + 1
                return self._getjson(url, errorcnt)
            else:
               raise

    def get_live_comment(self, movie_id):
        self.movie_id = movie_id


        #Get video distribution location(getflv)
        res = self.opener.open("http://watch.live.nicovideo.jp/api/getplayerstatus?v="+self.movie_id).read()
        root = etree.fromstring(res)
        messageServers = root.xpath('//ms')
        if len(messageServers) == 0:
            raise Exception('UnexpectedXML')

        user_ids = root.xpath('//user_id')
        if len(user_ids) == 0:
            raise Exception('NotfoundUserId')
        user_id = user_ids[0].text

        thread_id = messageServers[0].find('thread').text
        addr = messageServers[0].find('addr').text
        port = int(messageServers[0].find('port').text) - 2725
        
        #Get waybackkey
        waybackkeyUrl = ('http://watch.live.nicovideo.jp/api/getwaybackkey?thread=%s' % thread_id)
        req = urllib2.Request(waybackkeyUrl)
        res = self.opener.open(waybackkeyUrl).read()
        waybackkey = cgi.parse_qs(res)['waybackkey'][0]

        msUrl = 'http://%s:%d/api.json/thread?' % (addr, port)
        chats = []
        req = urllib2.Request(msUrl)
        when = '4294967295'
        while True:
            data = {
                'thread' : thread_id, 
                'version' : "20061206",
                'res_from' : '-1000',
                'waybackkey' : waybackkey,
                'user_id' : user_id,
                'when': when,
                'scores' : '1'
            }
            list = self._getjson(msUrl+urllib.urlencode(data), 0)
            chatcnt = 0
            insertdata = []
            for l in list:
                if 'chat' in l:
                    if chatcnt == 0:
                        when = int(l['chat']['date']) - 1
                    if l['chat']['content'] != '/disconnect':
                        insertdata.append(l['chat'])
                    chatcnt += 1
            chats = insertdata + chats
            if chatcnt == 0:
                break
        return chats


nicolive.py


# coding: utf-8
import sys
from niconico_ctrl import NicoCtrl
import json

def main(argvs, argc):
    if len(argvs) != 4:
        print ('python nicolive.py email pass lv142315925')
        return 1
    nicovideo_id = argvs[1]
    nicovideo_pw = argvs[2]
    move_id = argvs[3]

    t = NicoCtrl(nicovideo_id, nicovideo_pw)
    chats = t.get_live_comment(move_id)
    f = open(move_id + '.json', 'w')
    f.write(json.dumps(chats))
    f.close()
    return 0

if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

By executing this script as follows, a JSON file containing comment information will be generated.

 python nicolive.py email address password lv200730443

After that, if you morphologically analyze the comments in the created JSON file to extract words, aggregate them for each user, and do some other things, you can analyze the comments of Nico Nama, Tae-chan.

reference

** niconico message (comment) Explanation of server tags and how to send ** http://blog.goo.ne.jp/hocomodashi/e/3ef374ad09e79ed5c50f3584b3712d61

** Get comments on Nico Nico Douga ** http://d.hatena.ne.jp/MOOOVe/20120229/1330512626

Recommended Posts

That's why I'll analyze the comments from the House of Representatives election Nico Nama's party leader debate.
That's why I'll look up tweets from the House of Representatives election
That's why I analyze the homepages of each political party
That's why I calculate the number of seats for the proportional representation in the lower house election