[PYTHON] Try to download Youtube videos using Pytube

The method of downloading songs (or videos) from NCS (NoCopyrightSounds) and AudioLibrary was troublesome (unless it was Python), so I will write it as a memorandum.

Not for people who can find it by searching for "Pytube" It's an article for people who searched for "youtube video download Python" ...

The person who writes the article is also a beginner in programming, let alone Python: weary:

Traditional method

There are many ways to download Youtube videos for a long time, but roughly

Most people think that if they want to save Youtube videos, they often use a suspicious service on the web, but recently, the service itself is also "played by youtube": thinking ::

From youtube's point of view, it's natural because it's an access that doesn't generate advertising revenue. Of course there are long-lived services, but UIUX is low and full of ads.

Reasons to choose Pytube

Because it was so easy

For me, a beginner in Python (or programming itself), I thought it was the best because there were few libraries to remember and few methods. Intuition.

Light operation on the client side

It's not a heavy library, so it works very lightly. I don't know how to handle it internally, but it's amazing.

Download is explosive for some reason

I think that many people have used WEB services, but the biggest difficulty is their "slowness". Even a 20-minute video often has to wait 5 to 10 minutes. From the point of view of the service, it is natural because it is temporarily saved.

However, Pytube is explosive. Even if you download a 2-hour video in Full HD (about 1.5GB), it will end in a few minutes. (Of course, it depends on the line environment)


important point

YouTube Terms of Service

Since the internal operation of Pytube is a mystery, I will refrain from mentioning it Please be aware that ** YouTube prohibits scraping. Of course, operating YouTube by programming is not prohibited.

YouTube is officially provided with a "YouTube Data API", "Get the information on YouTube via here" is prepared. Of course, there is also an API-key, so it records who accessed it.

About YouTube Data API

Let me mention a little about the YouTube Data API. With this API, you can get video information and channel information, and perform various operations related to videos.

  • Check the number of subscribers
  • Check the number of video posts
  • Check the number of views
  • Check the video information (title/summary/URL)
  • Find out about upcoming video posts
  • Check the rating for the video

You can do various things. Although omitted, uploading operations are also possible, and if you want to make something like a YouTube browser, you can get essential information.

YouTube is not OK

Pytube itself is not an official library, so it does not mean that "YouTube is OK". (Details are omitted) The copyright of the video data is owned by YouTube and the creator. Therefore, YouTube is "different for each video" as to what to do with the secondary use of downloadable videos.

*** Technically (easily) feasible, but should not be overused ***

I think that is the correct interpretation.

Secondary use and storage of downloaded videos

Copyright is very complicated, but if you are referring to this article, please refer to the following three.

  1. Save downloaded videos only locally (private storage)
  2. Secondary use of downloaded videos is only possible if the copyright holder is OK
  3. If you want to monetize the second-used video, use YouTube (monetization examination)

in short

In terms of YouTube terms of use, Please be aware that copyright law is also a ** very gray zone act **.


Whole code

Jupiter Notebook people also need to install

pip install pytube

Required library

from pytube import YouTube
import os
import pandas as pd
import datetime
today = datetime.datetime.now()
todayis = str(today.year) + "Year" + format(today.month, '0>2') + "Moon" + format(today.day, '0>2') + "Day"
videopath = 'hogehoge' ###Specify the save destination of the video
savepath = videopath +'\\'+ todayis

if not os.path.exists(savepath):
    os.mkdir(savepath)

def EasyVideoDL():
    try:
        videolist = '\\douga-list.csv'
        video_df = pd.read_csv(videolist, header=None)

        for url in video_df[0]:
            yt = YouTube(url)
            f = savepath + '\\' + yt.title
            yt.streams.get_highest_resolution().download(f)
            print(yt.title)
    except:
        pass

EasyVideoDL()

"Do you need a try?" Please forgive me for "meaning to make it a function": weary:

Explain each? To do


Preparing the save destination folder and reading the CSV file

Here, prepare the save destination and read the CSV file containing the video URL to make it available. If you are interested, try running it line by line (block) in Jupiter Notebook.

today = datetime.datetime.now()

Get today's date as datetime type

todayis = str(today.year) + "Year" + format(today.month, '0>2') + "Moon" + format(today.day, '0>2') + "Day"

To the format of "YYYY year MM month DD day" with the fetched date as a character string You can format it when you get it with datetime type, but it's important to make it easy to understand: weary:

videopath = 'hogehoge'
savepath = videopath +'\\'+ todayis

Specify the (parent) directory where you want to save the video (Windows people are happier to use "r") Define the directory name for each date actually saved

if not os.path.exists(savepath):
    os.mkdir(savepath)

Create a date directory. If you already have it, do nothing.

videolist = '\\douga-list.csv'
video_df = pd.read_csv(videolist, header=None)

Store CSV in pandas DataFrame. Delete the header when storing (not needed this time)

The contents of CSV look like this

https://www.youtube.com/watch?v=Kp-eibuQpWg
https://www.youtube.com/watch?v=JgT5vhDB7W8
https://www.youtube.com/watch?v=kY-6R5fa0gw

There is no intention in the URL destination. I just refer to a long video (see below)

Turn Pytube

for url in video_df[0]:
    yt = YouTube(url)
    f = savepath + '\\' + yt.title
    yt.streams.get_highest_resolution().download(f)
    print(yt.title)
  1. Extract the value from video_df. Since it is only the first column, specify video_df [0]. Store the retrieved value in url.
  2. Pass the url to the YouTube method of Pytube and store it.
  3. Define the save folder in f.
  4. (described later)
  5. Show progress on console (.title gets the title of the video)

Windows or Mac? Since "/" and "." Cannot be used for folders, I love that area.

yt.streams.get_highest_resolution().download(f)

yt is a translation ofYouTube (url), You can get the video data by continuing with .streams from there.

This time I used .get_highest_resolution (). This refers to "the best quality video with audio".

However, this is the songwriter, and the composition of the Youtube video is as follows. (Actually it depends on the video being uploaded)

file Video voice format
1 720p AAC MP4
2 1080p None MP4
3 360p AAC MP4
4 360p AAC webm
5 None AAC MP4
6 None AAC webm

According to the current specifications of Youtube, it seems that 1080p or higher resolution videos are "separated from audio" operation, and 1080p videos cannot be directly downloaded as MP4 with audio. If you want to download high quality 1080p or higher videos, It is necessary to download the video without audio and the video with audio only once, synthesize them, and regenerate them. (This may also be the reason why the suspicious service on the WEB is struggling.)

So, you can get the movie of 720p + AAC with.get_highest_resolution (). You can also specify this combination using something called itag, I couldn't grasp the regularity of itag, so I used the one that gives the same result in a stable manner.

(If you want to collect only songs, use .get_audio_only () to target AAC-MP4)

And

Save the video with .download (). Specify the save destination folder in the argument. In this case, it's f.

If you turn this to the end of video_df with a for statement, the videos will be saved in order. The last print may or may not be present. Since you can see which video the DL has finished, it is just displayed.


Finally

As you can see when you run it, it's faster than watching a video on youtube. I think VideoBrowser will do its best only a few seconds before the playhead, so ** You can see that it puts a load on youtube rather than the act of viewing it with an application or browser. ** **

I don't think Google will get angry at this level, I don't want to be glared at by Google, so I will use it moderately.

If you've ever stared at Google Chrome developer tools, you know. Youtube videos are sometimes divided into multiple parts in terms of data. I don't know if it's the length or capacity of the video ...: thinking:

Considering that, pytube that exports as one file is excellent.

I'm a member of Youtube Premium, so I don't need to say "to see it on a mobile app". It was around this time that I wish I could download and watch it on my PC.

It was a good subject to think about during the New Year holidays: relieved:

Pytube official

https://python-pytube.readthedocs.io/en/latest/

Recommended Posts

Try to download Youtube videos using Pytube
How to download youtube videos using pytube3
How to download youtube videos with youtube-dl
Upload videos using YouTube API
I tried to search videos using Youtube Data API (beginner)
Try using pynag to configure Nagios
Try to get statistics using e-Stat
Try to detect fusion movement using AnyMotion
Try to operate Excel using Python (Xlwings)
Try using Tkinter
Automatically search and download YouTube videos with Python
Try using django-import-export to add csv data to django
youtube download memo
Try to separate Controllers using Blueprint in Flask
Try using cookiecutter
Try using PDFMiner
Try using geopandas
Transcription of YouTube videos using GCP's Cloud Speech-to-Text
Try using Selenium
Try using scipy
Get information about videos uploaded to YouTube [Python]
Try using pandas.DataFrame
Download files directly to Google Drive (using Google Colaboratory)
Try to create an HTTP server using Node.js
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
Try casting videos and websites from Raspberry Pi to Chromecast and Nest Hub using CATT
(Python) Try to develop a web application using Django
Try to make RESTful API with MVC using Flask 1.0.2
Try to extract high frequency words using NLTK (python)
[Machine learning] Try to detect objects using Selective Search
Try to solve Sudoku at explosive speed using numpy
Download images using requests
Try using virtualenv (virtualenvwrapper)
[Azure] Try using Azure Functions
Try to implement yolact
Try using virtualenv now
Try using W & B
Try using Django templates.html
[Kaggle] Try using LGBM
Try using Python's feedparser.
Try using Python's Tkinter
Try using Tweepy [Python2.7]
Try using Pytorch's collate_fn
Try to make it using GUI and PyQt in Python
Try to make PC setting change software using TKinter (beginner)
Try to operate an Excel file using Python (Pandas / XlsxWriter) ①
Try to operate an Excel file using Python (Pandas / XlsxWriter) ②
How to convert Youtube to mp3 and download it super-safely [Python]
Try to determine food photos using Google Cloud Vision API
Convert pixiv to mp4 and download from pixiv using python's pixivpy
Try to implement linear regression using Pytorch with Google Colaboratory