[PYTHON] Download data directly from Drive URL (Google Colaboratory)

at first

Below is the flow of this article. __1. Enable download directly from Google Drive shared link __ __2. Download data using the above URL with python or wget, curl (on CLI) __ __3. Precautions when executing with Google Coloaboratory __

Download directly with Google Drive shared link

If you create a file sharing link on Google Drive, you will have to jump to the following page and download it manually. スクリーンショット 2020-03-02 22.13.21.png

Then click the URL and convert the URL so that you can download it directly. There are URL conversion tools, etc., but you can download directly from the URL simply by rewriting the URL as follows.

file/d->uc?id=oruc?export=download&id= /view?usp=sharing->

https://drive.google.com/file/d/<file_id>/view?usp=sharing
↓
https://drive.google.com/uc?id=<file_id>
or
https://drive.google.com/uc?export=download&id=<file_id>

Download with Python or Shell

Code to download with urlretrieve, wget, curl by specifying the URL converted earlier Python

import urllib.request
import sys
url = "https://drive.google.com/uc?export=download&id=<file_id>"
file_name = "file_name"
urllib.request.urlretrieve(url, file_name)

Shell

wget "https://drive.google.com/uc?export=download&id=<FILE_ID>" -O <FILE_NAME>
or
curl "https://drive.google.com/uc?export=download&id=<FILE_ID>" -O <FILE_NAME>

For large files

If the file size is too large as shown below, virus scanning will not be performed, so confirmation will be required when downloading, and when the above code is executed, the html file of the confirmation page itself will be downloaded. スクリーンショット 2020-03-02 22.24.46.png

To avoid that, you need to get the code for confirm. You can get it with the following code.

curl -sc /tmp/cookie "https://drive.google.com/uc?export=download&id=<FILE_ID>" > /dev/null
CODE="$(awk '/_warning_/ {print $NF}' /tmp/cookie)"  
curl -Lb /tmp/cookie "https://drive.google.com/uc?export=download&confirm=${CODE}&id=<FILE_ID>" -o <FILE_NAME>

Run on Google Coloboratory

I just run the code above in a cell, but just adding a ! At the beginning of the line doesn't save the variable, as shown below. Therefore, you can execute it like a shell script by writing %% shell at the beginning of the cell. スクリーンショット 2020-03-02 22.58.24.png

Add %% shell to the beginning of the code above

スクリーンショット 2020-03-02 23.02.20.png

unzip etc.

スクリーンショット 2020-03-02 23.33.13.png

Finally

To be honest, I don't really feel the need to do this because I only have to put it on my Drive from the share link, but lol I can not put a large amount of data on github, so when sharing a notebook file of Google Colaboratory etc. I wondered if there is an advantage that you only have to execute the cell immediately after cloning by writing it in advance in.

Reference & citation

Download published Google Drive data with curl or wget Download files on the Web with Python

Recommended Posts

Download data directly from Drive URL (Google Colaboratory)
Download files directly to Google Drive (using Google Colaboratory)
[Python] Save PDF from Google Colaboratory to Google Drive! -Let's collect data for machine learning-
Google colaboratory
How to search Google Drive with Google Colaboratory
Download images from URL list in Python
[Python] Download original images from Google Image Search
Download the image from the text file containing the URL
Download the csv file created by Google Colaboratory
Make a copy of a Google Drive file from Python
How to load files in Google Drive with Google Colaboratory