[Python] Extract the video ID from the YouTube video URL [Note]

Preface

There are many articles on how to retrieve YouTube video URLs, but support for shortened URLs that start with https://youtu.be/ generated when you press the" Share "button, and URLs. If you include query parameters (for example, t = 15 that specifies the time or feature = youtu.be that indicates the transfer from the shortened URL), I felt that all of them were not considered, so write them here as a memo Try. By the way, the YouTube URL query parameter t, which indicates the playback start position, is https://youtu.be/r4Mkv-q4NmQ?t=5437 and Like https://youtu.be/r4Mkv-q4NmQ?t=5437s Of course, all are specified in seconds Like https://youtu.be/r4Mkv-q4NmQ?t=1h30m37s If you set ◯ h △ m □ s, the URL will start playing from" ◯ hours △ minutes □ seconds "!

The YouTube URL in this article is basically my posted video or channel URL!

Source code

Works with Python3 series. It seems that there is no ʻurllib.parse` module in Python2 series.

import urllib.parse
import re

##############################################################
##Extract YouTube video id from URL list
##Supports normal URLs and shortened URLs. Error message is displayed for unsupported URLs
##Arguments: List of URLs
##Return value: List of extracted video ids
##############################################################
def pick_up_vid_list(url_list):
  vid_list = []
  pattern_watch = 'https://www.youtube.com/watch?'
  pattern_short = 'https://youtu.be/'

  for i, url in enumerate(url_list):
    #When using a normal URL
    if re.match(pattern_watch,url):
      yturl_qs = urllib.parse.urlparse(url).query
      vid = urllib.parse.parse_qs(yturl_qs)['v'][0]
      vid_list.append(vid)

    #For shortened URLs
    elif re.match(pattern_short,url):
      # "https://youtu.be/"The 11 characters following the video ID
      vid = url[17:28]
      vid_list.append(vid)

    else:
      print('error:\n URL is\"https://www.youtube.com/watch?\"Or')
      print('  \"https://youtu.be/\"Please specify the URL that starts with.')
      print('  - '+ str(i+1)+ 'Item:' + url)
  return vid_list

Brief commentary

For regular URLs that start with https://www.youtube.com/watch?, the video ID corresponds to the v parameter of the URL query, so I'm extracting it! In the shortened URL that starts with https://youtu.be/, the 11 characters following https://youtu.be/ are always the video ID, so I'm taking it out!

I was worried about the possibility of carrying up to 12 characters and thought I had to look for it with a regular expression, but apparently it's okay. → [About the risk of the v value of YouTube being carried-Nipotan Research Institute](http://blog.livedoor.jp/nipotan/archives/50588074.html" About the risk of the value of v of YouTube being carried --Nipotan Research Institute ") Also, according to this article, it seems that the video ID is made up of [0-9] [a-z] [A-Z], - and _. According to "[Characters that can be used in URLs, characters that cannot be used](https://www.ipentec.com/document/web-url-invalid-char" Characters that can be used in URLs, characters that cannot be used ")" It seems that it can not be used for anything other than this, so I will not increase the character type, and if it becomes insufficient, I will increase the number of digits.

Example of use

url_list = [
'https://www.youtube.com/watch?v=k3nPaVj8-3w',
'https://www.youtube.com/watch?v=2k-uF-QPcEM&t=5',
'https://www.youtube.com/watch?v=5_Vy0ZtPo_w',
'https://youtu.be/_t-i0KLiJBk',
'https://youtu.be/tfIvsrRxaXg',
'https://youtu.be/biaC_2Mx7Mw?t=283',
'https://www.youtube.com/',
'https://www.youtube.com/channel/UCDWM7dKT5vLXqSi_YljdlBw']
vid_list = pick_up_vid_list(url_list)

for vid in vid_list:
  print (vid)

Execution result:

error:
URL is"https://www.youtube.com/watch?"Or
  "https://youtu.be/"Please specify the URL that starts with.
  -7th: https://www.youtube.com/
error:
URL is"https://www.youtube.com/watch?"Or
  "https://youtu.be/"Please specify the URL that starts with.
  -8th: https://www.youtube.com/channel/UCDWM7dKT5vLXqSi_YljdlBw
k3nPaVj8-3w
2k-uF-QPcEM
5_Vy0ZtPo_w
_t-i0KLiJBk
tfIvsrRxaXg
biaC_2Mx7Mw

Afterword

Some standard Python methods can analyze query parameters! Great comfort! I can't do it without using purl.js with JavaScript! Well, of course you can implement it yourself, but ... it's a hassle.

References

How to use regular expressions in Python --Qiita How to use Python's regular expression module re (match, search, sub, etc.)| note.nkmk.me Get / create / change URL query string (parameter) in Python| note.nkmk.me

Recommended Posts

[Python] Extract the video ID from the YouTube video URL [Note]
YouTube video management with Python 3
Download video from YouTube (youtube-dl)
I tried face recognition from the video (OpenCV: python version)
Extract the value closest to a value from a Python list element
I want to extract an arbitrary URL from the character string of the html source with python
Existence from the viewpoint of Python
Extract text from images in Python
Memorize the Python commentary on YouTube.
Use the Flickr API from Python
Extract the targz file using python
Python Note: Get the current month
Extract strings from files in Python
Extract lines that match the conditions from a text file with python
[Note] Get data from PostgreSQL with Python
Learning notes from the beginning of Python 1
Note for Pyjulia calling Julia from Python
# 5 [python3] Extract characters from a character string
Check if the URL exists in Python
Download images from URL list in Python
Launch the Python interpreter from Git bash
From Python 3.4, pip becomes the standard installer! ??
Try hitting the YouTube API in Python
Python Note: The secret role of commas
[Note] Execute Python code from Excel (xlwings)
DJango Note: From the beginning (form processing)
Learning notes from the beginning of Python 2
[Python] Get the main color from the screenshot
[Python] (Line) Extract values from graph images
Note: Python
Use PIL in Python to extract only the data you want from Exif
Python note
Get the contents of git diff from python
Extract text from PowerPoint with Python! (Compatible with tables)
Python scraping Extract racing environment from horse racing site
[Note] Export the html of the site with python.
Download the image from the text file containing the URL
[Automation] Extract the table in PDF with Python
ffmpeg-Build a python environment and split the video
A note about the python version of python virtualenv
The guy who downloads audio from YouTube playlists
Operate the schedule app using python from iphone
[Note] About the role of underscore "_" in Python
Extract only complete from the result of Trinity
Issue the Amazon CloudFront Signed URL in Python
Use the nghttp2 Python module from Homebrew from pyenv's Python
Call Polly from the AWS SDK for Python
Try accessing the YQL API directly from Python 3
Extract data from a web page with Python
Extract images and tables from pdf with python to reduce the burden of reporting