[PYTHON] I tried to scrape YouTube, but I can use the API, so don't do it.

I made it a long time ago to grasp the fashion from YouTube, but now it's okay with API

Memorial service Also, don't scrape YouTube


from selenium import webdriver
import time
from selenium.webdriver.common.action_chains import ActionChains
import urllib.parse


def main():
    #Search word
    search_words = ['Alpha wave', 'sleep']
    #open chrome
    driver = webdriver.Chrome('../chromedriver')
    s = '+'.join(map(urllib.parse.quote, search_words))
    driver.get("https://www.youtube.com/results?search_query=" + s + '&sp=CAM%253D')
    info_list = []
    time.sleep(1)
    for i in range(10):
        driver.execute_script("scrollBy(0, 1000);")
    for i in range(35, 45):
        info = {'title': '', 'url': '', 'channel': '', 'registrant': 0, 'release': ''}
        loop_flag = 0
        selector = f'#contents > ytd-item-section-renderer:nth-child({i // 20 + 1}) > #contents > ytd-video-renderer:nth-child({20 if i % 20 == 0 else i % 20}) > #dismissable > #video-title > yt-formatted-string'
        while loop_flag <= 2:
            try:
                element = driver.find_element_by_css_selector(selector)
                actions = ActionChains(driver)
                actions.move_to_element(element)
                actions.perform()
                info['url'] = element.get_attribute('href')
                break
            except Exception as e:
                print(i, e)
                print(selector)
                loop_flag += 1
                time.sleep(1)
        if not info['url'] == '':
            info_list += info
    print(info_list)
    print(len(info_list))
    driver.quit()


if __name__ == "__main__":
    main()


Recommended Posts

I tried to scrape YouTube, but I can use the API, so don't do it.
I use python but I don't know the class well, so I will do a tutorial
I tried to touch the COTOHA API
I tried to expand the database so that it can be used with PES analysis software
I installed PySide2, but pyside2-uic didn't work, so I managed to do it.
I tried to touch the API of ebay
I tried to publish my own module so that I can pip install it
I tried to use Java with Termux using Termux Arch but it didn't work
I tried to use Resultoon on Mac + AVT-C875, but I was frustrated on the way.
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I wanted to use the find module of Ansible2, but it took some time, so make a note
The tree.plot_tree of scikit-learn was very easy and convenient, so I tried to summarize how to use it easily.
I tried to make OneHotEncoder, which is often used for data analysis, so that it can reach the itch.
[First COTOHA API] I tried to summarize the old story
I tried to search videos using Youtube Data API (beginner)
I tried to get various information from the codeforces API
I tried to summarize how to use the EPEL repository again
[For those who want to use TPU] I tried using the Tensorflow Object Detection API 2
I don't tweet, but I want to use tweepy: just display the search results on the console
I made a function to crop the image of python openCV, so please use it.
The Like (LGTM) order has disappeared from My Page, so use the Qiita API to get it.
I tried to create Quip API
I tried the Naro novel API 2
[Qiita API] [Statistics • Machine learning] I tried to summarize and analyze the articles posted so far.
I tried to touch Tesla's API
[Python] The status of each prefecture of the new coronavirus is only published in PDF, but I tried to scrape it without downloading it.
I tried the Naruro novel API
To celebrate the release of Django 3.0, I tried ASGI, the spiritual successor to WSGI, but I couldn't use websocket.
I don't really understand the difference between modules, packages and libraries, so I tried to organize them.
[Python] I tried to get various information using YouTube Data API!
[Shell script] It's annoying to send the same content every week, so I tried to automate it! !! !!
I tried to move the ball
I tried using the checkio API
I tried to estimate the interval.
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
I want to do it with Python lambda Django, but I will stop
When I tried to run Python, it was skipped to the Microsoft Store
I don't want to admit it ... The dynamical representation of Neural Networks
I tried to make a calculator with Tkinter so I will write it
I tried to get the authentication code of Qiita API with Python.
Matching karaoke keys ~ I tried to put it on Laravel ~ <on the way>
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried to install Docker on Windows 10 Home but it didn't work
I tried using "Streamlit" which can do the Web only with Python
I tried to get the movie information of TMDb API with Python
I tried to find out what I can do because slicing is convenient
I tried to summarize the umask command
I tried to recognize the wake word
I tried using YOUTUBE Data API V3
I tried to use deep learning to extract the part where the plant is shown from the photo of the balcony, but it didn't work, so I will summarize the contents of trial and error. Part 2
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to make a Web API
I tried using the BigQuery Storage API
I couldn't import the python module with VSCODE, but I could do it on jupyterlab, so I searched for the cause (2)
In IPython, when I tried to see the value, it was a generator, so I came up with it when I was frustrated.
I thought it would be slow to use a for statement in NumPy, but that wasn't the case.
I wanted to know the number of lines in multiple files, so I tried to get it with a command
The file edited with vim was readonly but I want to save it
python I don't know how to get the printer name that I usually use.
When I tried to change the root password with ansible, I couldn't access it.