[PYTHON] Scraping the holojour and displaying it with CLI

I made a program to scrape Hololive, which is the distribution schedule of Hololive, and display the contents easily with CLI.

Source code

GitHub

Precautions for use

This tool uses requests as an external library. If you already have pip installed You can do this with pip install requests </ b>

Also, this tool has nothing to do with the Hololive formula. Do not overload the server with more execution than necessary.

How to use

You can display the following contents by executing main.py in the repository. alt

You can also add options at runtime and check their contents with --help. As an example,

---- all Show all schedules except Bilibili videos, including Holosters ---- eng Display member names in English ---- tomorrow Show tomorrow's schedule

Etc. can be used. These options can be set and run at the same time.

Notes When making it, make a note of where you are stuck.

First of all, I was worried about how to separate Hololive members from the others from the data obtained by scraping. As a response

scraping.py


def delete_exception(time_list, stream_members_list, stream_url_list, is_all):

    EXCEPTION_LIST = {'Yogiri', 'Civia', 'SpadeEcho', 'Doris', 'Artia', 'Rosalyn'}

    if not is_all:
        #Slice to get only non-hololive members (e.g. holostars hololive-ID)
        EXCEPTION_LIST =  EXCEPTION_LIST | set(get_member_list()[29:])

    for i in range(len(time_list)):

        if stream_members_list[i] in EXCEPTION_LIST:
            time_list[i] = None
            stream_members_list[i] = None
            stream_url_list[i] = None

    time_list = [i for i in time_list if not i is None]
    stream_members_list = [i for i in stream_members_list if not i is None]
    stream_url_list = [i for i in stream_url_list if not i is None]

    return time_list, stream_members_list, stream_url_list

Prepare a set of Hololive Youtube distribution members and Bilibili distribution members in advance. From options etc., I made a set of members to exclude, replaced the scraped elements belonging to it with None, and finally deleted them collectively in the inclusion notation.

By the way, you can implement it in a list as well, but if you don't need the element numbers, it's many times faster to use a set than a list.


Another problem was that there were members whose names were English and Japanese, and due to the difference between half-width and full-width characters, it was not possible to display the columns neatly. To solve this, I used the standard library unicodedata .

unicode


if unicodedata.east_asian_width(stream_members_list[i][0]) == 'W': 
    m_space = ' ' * ( (-2 * len(stream_members_list[i]) + 18))               

else:
    m_space = ' ' * ( (-1 * len(stream_members_list[i]) ) + 18)

east_asian_width of unicodedata returns W when the argument character (one character because it is Char) is a full-width Japanese character. As a result, it was possible to display the lines in a uniform line using spaces, taking into consideration the number of characters in the name.

Finally

I'm glad that it has already been cloned by some people. We will continue to improve this repository.

Recommended Posts