[PYTHON] [Chocolate Planet] Extract damage reports from the comment section of the Bad Face Championship

Introduction

First, please take a look at this. [Plan] Bad Face Championship https://www.youtube.com/watch?v=IEkLSfs1F68

Operating environment

Implementation details

Source code

get_youtube_comments.py


import json
import re

import pandas as pd
import requests

API_KEY = 'Enter your API key'
VIDEO_ID = 'IEkLSfs1F68'


def get_comment_info(api_key, video_id, page_token):
    comment_url = 'https://www.googleapis.com/youtube/v3/commentThreads'
    param = {
        'key': api_key,
        'videoId': video_id,
        'part': 'replies, snippet',
        'maxResults': '100',
    }

    if page_token:
        param['pageToken'] = page_token

    response = requests.get(comment_url, params=param)
    return response.json()


def get_video_comments(api_key, video_id):
    comments = []
    page_token = ''

    while page_token != None:
        resource = get_comment_info(api_key, video_id, page_token)

        for comment_thread in resource['items']:
            #Get comments
            comment = comment_thread['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

            if ('replies' in comment_thread) and ('comments' in comment_thread['replies']):
                for replies in comment_thread['replies']['comments']:
                    #Get comments
                    reply_comment = replies['snippet']['textDisplay']
                    comments.append(reply_comment)

        if 'nextPageToken' in resource:
            page_token = resource['nextPageToken']
        else:
            page_token = None
    return comments


#Get a list of comments
comments = get_video_comments(API_KEY, VIDEO_ID)
#Remove line break tags
comments = list(map(lambda x: re.sub('<br />', '', x), comments))


target_list = []
report_comment_list = []

for comment in comments:
    target = re.findall('[Nagata|Matsuo]+[1-31-3①②③]', comment)
    #If there are multiple, only the unique value is extracted
    target = list(set(target))

    passive_words = re.findall('(Was|Was|Was there|Was there)', comment)

    if len(target) > 0 and len(passive_words) > 0:
        #If there are multiple targets in one comment, add each target to the list.
        for t in target:
            target_list.append(t)
            report_comment_list.append(comment)

df = pd.DataFrame({'target': target_list, 'comment': report_comment_list})

#Display several items at random
df.sample(5, random_state=42)

Execution result

print(df.shape)
->(178, 2)

It seems that a total of 178 damage reports have been submitted.

	target	comment
19 Matsuo ② I was really scared because I was chasing after him so relentlessly. I was worried because I had a young child in the back...
45 Nagata ① Parker was stolen by Nagata ①.
24 Nagata ③ Nagata ③ In addition, there is a lot of information that we have been talking to neighboring residents that "God took a hand with me"...
30 Matsuo ② Matsuo ② stole 96 crabs.
67 Matsuo ① Matsuo ①, Nagata ② and Matsuo ③ were fraudulent.

I'm curious about No.24.

Summary

Chocolate planet is good ~

Reference link

I was allowed to reference. Thank you very much. [Python] Get all comments using Youtube Data API Get comments and subscribers with YouTube Data API Chocolate Planet Channel

Recommended Posts

[Chocolate Planet] Extract damage reports from the comment section of the Bad Face Championship
Extract only complete from the result of Trinity