[PYTHON] youtube-dl extractor writing course 3/4

happy New Year

happy New Year. When I noticed that I was busy, it was over the year rather than the end of the year.

I look forward to working with you again this year.

Continuation of the last time

If you haven't seen it, please Last time from here. Last time I wrote error handling. This time, we will acquire more than the minimum amount of information.

final goals

Make this.

What you can see from the website

Also brought a suitable delivery. Let's open it.

image.png

As with Volume 1, open the developer tools and look for similar communications. You can also open it in Chrome with Ctrl + Shift + I.

Search by poster name for the time being

When I scrolled a little, the name of the poster and the description came out. If you search for this with DevTools, it looks like this.

image.png

The name of the poster etc. came out. The URL for this API endpoint was https://api.whowatch.tv/lives/19199115.

Extract this information

First, download 'https://api.whowatch.tv/lives/'+ video_id and parse it as JSON.

metadata = self._download_json('https://api.whowatch.tv/lives/' + video_id, video_id)

By the way, do you remember that the return value of _real_extract was returned by dict? There is a certain convention for this dict key, which can be found in the OUTPUT TEMPLATE section of the README.md. ](https://github.com/ytdl-org/youtube-dl#output-template)

For example, the poster name and poster nickname handled this time are uploader and uploader_id, respectively.

Follow this rule and set dict.

#Retrieving other information
uploader = metadata['live']['user']['name']
uploader_id = metadata['live']['user']['user_path']

return {
    # (Abbreviation)
    #Posted by information
    'uploader': uploader,
    'uploader_id': uploader_id,
}

Similarly, you can set thumbnails with the thumbnail key (which can be exported with the --write-thumbnail option).

thumbnail = metadata['live']['latest_thumbnail_url']

return {
    # (Abbreviation)
    #thumbnail
    'thumbnail': thumbnail,
}

in conclusion

This time, I wrote to get more information than the minimum. Now that you have an extractor similar to your final goal, let's take a quick look at youtube-dl coding rules. There is an easy way to handle errors such as KeyError with youtube-dl, so I would like to make it an issue for everyone.

Thank you for visiting our website.

Appendix: This code

# coding: utf-8
from __future__ import unicode_literals

from .common import InfoExtractor
from ..utils import ExtractorError


class WhoWatchIE(InfoExtractor):
    _VALID_URL = r'https?://whowatch\.tv/viewer/(?P<id>\d+)/?'

    def _real_extract(self, url):
        video_id = self._match_id(url)

        #Specify in the order of URL and ID
        live_data = self._download_json('https://api.whowatch.tv/lives/' + video_id + '/play', video_id)
        metadata = self._download_json('https://api.whowatch.tv/lives/' + video_id, video_id)

        #For debugging: live_Display data
        # self.to_screen(live_data)

        #URL of HLS
        hls_url = live_data.get('hls_url')

        # hls_Throw an error if there is no url
        if not hls_url:
            raise ExtractorError(live_data.get('error_message'), expected=True)

        #Search the HLS format for the time being
        formats = self._extract_m3u8_formats(
            hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
            m3u8_id='hls')
        #Rearranges. This will ensure that you download the highest quality without any settings.
        self._sort_formats(formats)

        #Retrieving other information
        uploader = metadata['live']['user']['name']
        uploader_id = metadata['live']['user']['user_path']
        thumbnail = metadata['live']['latest_thumbnail_url']

        return {
            'id': video_id,
            #Since there was a hook bracket, I will remove it at the same time
            'title': live_data['share_info']['live_title'][1:-1],
            #Format list
            'formats': formats,
            #This is a live broadcast
            'is_live': True,
            #Posted by information
            'uploader': uploader,
            'uploader_id': uploader_id,
            #thumbnail
            'thumbnail': thumbnail,
        }

This time's code is I'll put it here as well.

Recommended Posts

youtube-dl extractor writing course 3/4