Download Google Drive files in Python

I tried downloading the file from Google Drive. If this works, I'm thinking of creating a system that can process just by putting the file.

There is officially something simpler than this article. Official Quick Start (Java, Node, Python) https://developers.google.com/drive/api/v3/quickstart/python

Source

Run the following Python. The first time you run it, you need client_secret.json. If successful, token.pickle will be created. When executed, it will download jpg, png directly under the folder called AAA on Google Drive.

main.py


# -*- coding: utf-8 -*-
from __future__ import print_function
import pickle
import os.path
import io
import sys

# pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.http import MediaIoBaseDownload

SCOPES = ['https://www.googleapis.com/auth/drive']
FOLDER_NAME = 'AAA'

def main():
    # OAuth
    drive = None
    creds = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        elif os.path.exists('client_secret.json'):
            flow = InstalledAppFlow.from_client_secrets_file(
                'client_secret.json', SCOPES)
            creds = flow.run_local_server(port=0)
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    
    if creds and creds.valid:
        drive = build('drive', 'v3', credentials=creds)
    if not drive: print('Drive auth failed.')

    # Folfer list
    folders = None
    if drive: 
        results = drive.files().list(
            pageSize=100, 
            fields='nextPageToken, files(id, name)',
            q='name="' + FOLDER_NAME + '" and mimeType="application/vnd.google-apps.folder"'
            ).execute()
        folders = results.get('files', [])
        if not folders: print('No folders found.')

    # File list
    files = None
    if folders:
        query = ''
        for folder in folders:
            if query != '' : query += ' or '
            query += '"' + folder['id'] + '" in parents'
        query = '(' + query + ')'
        query += ' and (name contains ".jpg " or name contains ".png ")'

        results = drive.files().list(
            pageSize=100, 
            fields='nextPageToken, files(id, name)',
            q=query
            ).execute()
        files = results.get('files', [])
        if not files: print('No files found.')

    # Download
    if files:
        for file in files:
            request = drive.files().get_media(fileId=file['id'])
            fh = io.FileIO(file['name'], mode='wb')
            downloader = MediaIoBaseDownload(fh, request)
            done = False
            while not done:
                _, done = downloader.next_chunk()

if __name__ == '__main__':
    main()

From preparation to execution

1. Access Google APIs

https://console.developers.google.com/apis/credentials Log in with your Google account. Since project creation is called for the first time, please add My Project etc.

2. Enable Google Drive API

Select GoogleDriveAPI from your library and enable the API.

3. Create OAuth consent screen

Create a consent screen. UserType = External Application name = appropriate name (can be changed later) Others are blank and OK. The name you give here will be displayed on the authentication screen.

4. Download client_secret.json

Create an OAuth client ID. Select OAuth Client ID from Create Credentials and create it with Application Type = Desktop App. Once created, press the Download Client ID button and the client_secret-xxx.json will be downloaded.

5. Run the app

Run the above python code.

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
python main.py

The browser will be launched, so log in with your Google account. In "This app has not been confirmed", select "Show details" and "Go to unsafe page". If the authentication is successful, token.pickle will be created. It is successful when the jpg, png file directly under the AAA folder of Google Drive is downloaded.

Execution result

I put the file in Google Drive as follows. Download directly under the folder named AAA. Note that Google Drive can create the same folder name and the same file name.

folder File result
AAA img1.jpg OK
AAA img1.jpg OK
AAA/AAA img2.jpg OK
AAA/BBB img3.jpg NG
AAA img4.jpg OK
BBB img5.jpg NG
BBB/AAA img6.jpg OK
/ img7.jpg NG

Commentary

OAuth authentication

SCOPES = ['https://www.googleapis.com/auth/drive']
#When you already have a token
creds = None
if os.path.exists('token.pickle'):
    with open('token.pickle', 'rb') as token:
        creds = pickle.load(token)

if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    elif os.path.exists('client_secret.json'):
        
        #An authentication URL will be issued, so log in and allow it.
        flow = InstalledAppFlow.from_client_secrets_file(
            'client_secret.json', SCOPES)
        creds = flow.run_local_server(port=0)
    
    #Save with pickle
    with open('token.pickle', 'wb') as token:
        pickle.dump(creds, token)

SCOPES = ['https://www.googleapis.com/auth/drive'] has all permissions, so I think you should actually narrow it down.

Initially I made it with Node. In that case, URL → Login and allow → The code will be displayed, so copy it to the app → Get the token. Python is easier. For the first time, I used a pickle. It seems that you can save an entire object (such as a tangible class) in binary. Is it serialization in other languages? It seems convenient.

Get list from Google Drive

results = drive.files().list(
    #Maximum number
    pageSize=100, 
    
    #Parameters you want to get
    fields='nextPageToken, files(id, name, parents)',
    
    #Query (get all if not specified)
    q='name contains ".jpg " or name contains ".png "'
    ).execute()

files = results.get('files', [])
for file in files:
    print(file['name'] +' '+ file['parents'][0])

parents is the ID of the parent folder. We don't know the name of the folder here, so we need to look up the ID separately. The main.py code above retrieves the file by getting the ID from the folder name.

The above example is searched for .jpg .png. If you do not search, json will become large due to extra files. You can also search only folders by writing mimeType = "application / vnd.google-apps.folder".

I haven't done this this time, but if pageSize = 100 is exceeded, it is necessary to re-acquire it using nextPageToken.

Parameter list of fields files ()

You can get parents by writing fields ='files (id, name, parents)' in the code. At first I was worried because I didn't know what to specify. As a result, you can get everything by running fields ='files'. If you get all, Json will be long, so it is better to specify. I will post the obtained results.

{"kind":"drive#file",
"id":"1PTrhGA14N-xxxx",
"name":"img1.jpg ",
"mimeType":"image/jpeg",
"starred":false,
"trashed":false,
"explicitlyTrashed":false,
"parents":["1Jigt87nbz-xxxx"],
"spaces":["drive"],
"version":"1",
"webContentLink":"https://drive.google.com/xxxx",
"webViewLink":"https://drive.google.com/file/xxxx",
"iconLink":"https://drive-thirdparty.xxxx",
"hasThumbnail":true,
"thumbnailVersion":"1",
"viewedByMe":true,
"viewedByMeTime":"2020-05-23T19:13:29.882Z",
"createdTime":"2020-05-23T19:13:29.882Z",
"modifiedTime":"2013-08-13T23:05:18.000Z",
"modifiedByMeTime":"2013-08-13T23:05:18.000Z",
"modifiedByMe":true,
"owners":[{xxxx}],
"lastModifyingUser":{xxxx},
"shared":false,
"ownedByMe":true,
"capabilities":{xx,xx,xx},
"viewersCanCopyContent":true,
"copyRequiresWriterPermission":false,
"writersCanShare":true,
"permissions":[{xxxx}],
"permissionIds":["1485xxxx"],
"originalFilename":"img1.jpg ",
"fullFileExtension":"jpg",
"fileExtension":"jpg",
"md5Checksum":"95c10exxxx",
"size":"492642",
"quotaBytesUsed":"492642",
"headRevisionId":"0BzjG8APx-xxxx",
"imageMediaMetadata":{"width":1920, "height":1200, xx},
"isAppAuthorized":false}

Summary

I tried downloading the file from Google Drive. When I actually tried it, I noticed various things. Like AWS S3, Google Drive is in the cloud. It's not like searching for local files, it has a quirk. In the case of AWS, we provide customers with the items prepared here, but in the case of Google Drive, we assume the items on the customer's side. So it seems that a little more work will be required. You can do something personally with AWS Lambda.

Recommended Posts

Download Google Drive files in Python
Download files in any format using Python
Parallel download in Python
Regularly upload files to Google Drive using the Google Drive API in Python
Download files directly to Google Drive (using Google Colaboratory)
Upload files to Google Drive with Lambda (Python)
Download the file in Python
Access Google Drive with Python
Google Drive Api Tips (Python)
Upload JPG file using Google Drive API in Python
How to load files in Google Drive with Google Colaboratory
How to download files from Selenium in Python in Chrome
Transpose CSV files in Python Part 1
Manipulate files and folders in Python
Handling of JSON files in Python
Sort large text files in Python
Download python
Read files in parallel with Python
Export and output files in Python
Extract strings from files in Python
Output tree structure of files in Python
Type annotations for Python2 in stub files!
Get Google Fit API data in Python
Referencing INI files in Python or Ruby
Automate jobs by manipulating files in Python
Download files on the web with Python
Download images from URL list in Python
Read and write JSON files in Python
Sample for handling eml files in Python
Upload images to Google Drive with Python
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Create and edit spreadsheets in any folder on Google Drive with python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3