[PYTHON] Read all including YouTube video comments and replies

I tried to get all the comments including the reply of YouTube video with python script

YouTube video comment unreadable problem

There are some YouTube videos that sometimes have strangely many comments, but if you try to read this from your browser or app,

Stop.

It just stops. Even if I turn on the fire after a long time with a PC, the browser stops perfectly.

Read with YouTube API / Script

In such a case, there seems to be a convenient thing called YouTube API, and there are many scripts that have already been published. However, I couldn't find anything that could get all the comments, so I tried to get all the comments including the reply.

script

getComments.py



import datetime
import json
import requests

URL = 'https://www.googleapis.com/youtube/v3/'
API_KEY = 'xxx'
VIDEO_ID = 'yyy'

def get_video_comment(video_id, n):
    try:
        next_page_token
    except NameError:
        params = {
            'key': API_KEY,
            'part': 'replies, snippet',
            'videoId': VIDEO_ID,
            'order': 'time',
            'textFormat': 'plaintext',
            'maxResults': n,
        }
    else:
        params = {
            'key': API_KEY,
            'part': 'replies, snippet',
            'videoId': VIDEO_ID,
            'order': 'time',
            'textFormat': 'plaintext',
            'pageToken': next_page_token,
            'maxResults': n,
        }
    response = requests.get(URL + 'commentThreads', params=params)
    resource = response.json()
    return resource

def print_video_comment_replies(match):
    for comment_info in match['items']:
        author = comment_info['snippet']['authorDisplayName']
        pubdate = comment_info['snippet']['publishedAt']
        text = comment_info['snippet']['textDisplay']
        pubdate = datetime.datetime.strptime(pubdate, '%Y-%m-%dT%H:%M:%SZ')
        pubdate = pubdate.strftime("%Y/%m/%d %H:%M:%S")
        print('\t___________________________________________\n')
        print("\n\tReply :\n\t{}\n\n\tby: {} date: {}".format(text, author, pubdate), "\n")

def print_video_comment(match):
    global parentId
    for comment_info in match['items']:
        parentId = comment_info['id']
        author = comment_info['snippet']['topLevelComment']['snippet']['authorDisplayName']
        pubdate = comment_info['snippet']['topLevelComment']['snippet']['publishedAt']
        text = comment_info['snippet']['topLevelComment']['snippet']['textDisplay']
        like_cnt = comment_info['snippet']['topLevelComment']['snippet']['likeCount']
        reply_cnt = comment_info['snippet']['totalReplyCount']
        pubdate = datetime.datetime.strptime(pubdate, '%Y-%m-%dT%H:%M:%SZ')
        pubdate = pubdate.strftime("%Y/%m/%d %H:%M:%S")
        print('---------------------------------------------------\n')
        print('{}\n\n by: {} date: {} good: {} reply: {}\n'.format(text, author, pubdate, like_cnt, reply_cnt))
        if reply_cnt > 0:
            replyMatch = treat_reply(match)
            print_video_comment_replies(replyMatch)
            global reply_next_page_token
            try:
                reply_next_page_token = replyMatch["nextPageToken"]
            except KeyError:
                pass
            else:
                while 'reply_next_page_token' in globals():
                    replyMatch = treat_reply(match)
                    print_video_comment_replies(replyMatch)
                    try:
                        reply_next_page_token = replyMatch["nextPageToken"]
                    except KeyError:
                        reply_next_page_token = None
                        del reply_next_page_token
            reply_next_page_token = None
            del reply_next_page_token

def treat_reply(match):
    for comment_info in match['items']:
        try:
            comment_info['replies']
        except KeyError:
            pass
        else:
            try:
                reply_next_page_token
            except NameError:
                params = {
                 'key': API_KEY,
                 'part': 'id, snippet',
                 'parentId': parentId,
                 'textFormat': 'plaintext',
                 'maxResults': 100,
                 'order': 'time',
                }
            else:
                params = {
                 'key': API_KEY,
                 'part': 'id, snippet',
                 'parentId': parentId,
                 'textFormat': 'plaintext',
                 'maxResults': 100,
                 'order': 'time',
                 'pageToken': reply_next_page_token,
                }
            response = requests.get(URL + 'comments', params=params)
            resource = response.json()
            return resource


# Main

key = None

while 'key' in locals():
    match = get_video_comment(VIDEO_ID, 100)
    print_video_comment(match)
    try:
        next_page_token = match["nextPageToken"]
    except KeyError:
        next_page_token = None
        del next_page_token
        del key

Set the variables API_KEY and VIDEO_ID in the script appropriately. VIDEO_ID is the video URL.

point

--Comments can be obtained with Comment Threads --Comment replies are obtained from Comments

I'm doing it. Well, as it is written in the manual ...

https://developers.google.com/youtube/v3/docs/commentThreads#resource

To retrieve all of the replies for the top-level comment, you need to call the comments.list method and use the parentId request parameter to identify the comment for which you want to retrieve replies.

I wanted to switch between comments-> comment replies in the main process, but I could only think of setting the CommentThreads parameter maxResults to 1, which would issue APIs for the number of comments. It would be ridiculous (or even angry enough in some cases) to do, so it has a slightly unsatisfactory structure of calling the comment reply function in print_video_comment (). The variable reply_next_page_token is reused between functions, so a global declaration is required. (I was addicted to it without noticing it at first.)

Other

The method of checking the existence of variables is not unified in the script, but don't worry ...

reference

API Reference - CommentThreads API Reference - Comments Get Comments on Youtube in Python

Recommended Posts

Read all including YouTube video comments and replies
Get comments on youtube Live with [python] and [pytchat]!