[Python] Sort spreadsheet worksheets by sheet name with gspread

background

I write scraped data to a spreadsheet every day via gspread. The sheet name is the date of the data to be written (yyyy/mm/dd) If it continues to operate normally, the sheets should be arranged in chronological order, but in rare cases the dates may be out of alignment due to the effects of writing failures.

スプレッドシートソート説明1.png Such a shape.

I had to sort them manually every time I found them, but it's a hassle ... ↓ Yes, let's automate.

code

import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import os
import sys

# GoogleAPI
scope = ['https://spreadsheets.google.com/feeds',
         'https://www.googleapis.com/auth/drive']
#Authentication key
json_keyfile_path = f'credentials.json path'
#Read the service account key
credentials = ServiceAccountCredentials.from_json_keyfile_name(json_keyfile_path, scope)
#Perform OAuth authentication for pydrive
gauth = GoogleAuth()
gauth.credentials = credentials
drive = GoogleDrive(gauth)
#Spreadsheet storage folder
folder_id = 'Folder ID that stores the workbook'
#Get a list of files in the spreadsheet storage folder
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % folder_id}).GetList()
#Extract only the file name from the file list
title_list = [file['title'] for file in file_list]
#Authenticate for gspread
gc = gspread.authorize(credentials)
#Workbook name
book_title = 'Test book'
#Get the spreadsheet ID of the workbook you want to open
sheet_id = [file['id'] for file in file_list if file['title'] == book_title]
sheet_id = sheet_id[0]
#Open workbook
workbook = gc.open_by_key(sheet_id)
#Get all the information on existing worksheets
worksheets = workbook.worksheets()
#Store the title of the current worksheet in the list
tmp_worksheets_title_list = [worksheet.title for worksheet in worksheets]
#Sort titles in descending order
worksheets_title_list = sorted(tmp_worksheets_title_list, reverse=True)
#Store Worksheet type objects in a list in sorted order
worksheets_obj_list = [worksheet for title in worksheets_title_list for worksheet in worksheets if worksheet.title == title]
#Sort worksheets
workbook.reorder_worksheets(worksheets_obj_list)

Execution result

スプレッドシートソート説明2.png

Explanation

First of all, I asked Google Sensei a lot, but I can't find anyone doing similar work. Apparently I have to do something myself.

Next, I searched for various official API references. Then, I found those things.

reorder_worksheets

I will read it for the time being.

reorder_worksheets(worksheets_in_desired_order)
Updates the index property of each Worksheets to reflect its index in the provided sequence of Worksheets.

Parameters:	worksheets_in_desired_order – Iterable of Worksheet objects in desired order.
Note: If you omit some of the Spreadsheet’s existing Worksheet objects from the provided sequence, those Worksheets will be appended to the end of the sequence in the order that they appear in the list returned by Spreadsheet.worksheets().

New in version 3.4.

Japanese translation

Update the index properties of each worksheet to reflect that index in the set of worksheets provided.

Parameters: worksheets_in_desired_order – You can iterate worksheet objects in any order.
Note: If you omit some of the existing worksheet objects in your spreadsheet from the sequence provided, those worksheets will be Spreadsheet..They are added to the end of the sequence in the order they appear in the list returned by worksheets ().

Version 3.4 new features.

I see, I don't know.

I decided to read the source directly.

    def reorder_worksheets(self, worksheets_in_desired_order):
        """Updates the ``index`` property of each Worksheets to reflect
        its index in the provided sequence of Worksheets.

        :param worksheets_in_desired_order: Iterable of Worksheet objects in desired order.

        Note: If you omit some of the Spreadsheet's existing Worksheet objects from
        the provided sequence, those Worksheets will be appended to the end of the sequence
        in the order that they appear in the list returned by ``Spreadsheet.worksheets()``.

        .. versionadded:: 3.4
        """
        idx_map = {}
        for idx, w in enumerate(worksheets_in_desired_order):
            idx_map[w.id] = idx
        for w in self.worksheets():
            if w.id in idx_map:
                continue
            idx += 1
            idx_map[w.id] = idx

        body = {
            'requests': [
                {
                    'updateSheetProperties': {
                        'properties': {'sheetId': key, 'index': val},
                        'fields': 'index',
                    }
                }
                for key, val in idx_map.items()
            ]
        }

        return self.batch_update(body)

I see, it seems that w.id is used to get the worksheet ID and sort based on it.

#Open workbook
workbook = gc.open_by_key(sheet_id)
#Get all the information on existing worksheets
worksheets = workbook.worksheets()
print(worksheets)
print(type(worksheets))
print(type(worksheets[0]))

Execution result


[<Worksheet '2020/12/10' id:0>, <Worksheet '2020/12/11' id:750827161>, <Worksheet '2020/12/12' id:868021966>, <Worksheet '2020/12/13' id:1358131870>, <Worksheet '2020/12/14' id:224556508>, <Worksheet '2020/12/16' id:2001082452>, <Worksheet '2020/12/17' id:207127532>, <Worksheet '2020/12/15' id:653171131>, <Worksheet '2020/12/18' id:1969672638>, <Worksheet '2020/12/19' id:428756371>]
<class 'list'>
<class 'gspread.models.Worksheet'>

In this way, worksheet information is stored in the list as Worksheet type. The sheet name and ID are stored, and this ID was acquired by the library earlier.

Pass the list that stores this Worksheet information to the library as an argument. Then, ** change the order in this list to the order you want to sort, and then ** pass it, and it will be sorted in that order.

#Store the title of the current worksheet in the list
tmp_worksheets_title_list = [worksheet.title for worksheet in worksheets]
print(tmp_worksheets_title_list)
#Sort titles in descending order
worksheets_title_list = sorted(tmp_worksheets_title_list, reverse=True)
print(worksheets_title_list)
#Store Worksheet type objects in a list in sorted order
worksheets_obj_list = [worksheet for title in worksheets_title_list for worksheet in worksheets if worksheet.title == title]
print(worksheets_obj_list)

Execution result


['2020/12/10', '2020/12/11', '2020/12/12', '2020/12/13', '2020/12/14', '2020/12/16', '2020/12/17', '2020/12/15', '2020/12/18', '2020/12/19']
['2020/12/19', '2020/12/18', '2020/12/17', '2020/12/16', '2020/12/15', '2020/12/14', '2020/12/13', '2020/12/12', '2020/12/11', '2020/12/10']
[<Worksheet '2020/12/19' id:428756371>, <Worksheet '2020/12/18' id:1969672638>, <Worksheet '2020/12/17' id:207127532>, <Worksheet '2020/12/16' id:2001082452>, <Worksheet '2020/12/15' id:653171131>, <Worksheet '2020/12/14' id:224556508>, <Worksheet '2020/12/13' id:1358131870>, <Worksheet '2020/12/12' id:868021966>, <Worksheet '2020/12/11' id:750827161>, <Worksheet '2020/12/10' id:0>]

Since it is not possible to sort the list that stores Worksheet directly, create a new list using the sheet name as a key.

    1. Store only sheet name in list ↓ Sort the list created in 2.1 in any order (descending sort this time) ↓
    1. Extract Worksheet information from the original Worksheet information storage list using the list that stores only the sorted sheet names as a key, and store it in a new list.

Now you have a Worksheet information storage list sorted in any order.

#Sort worksheets
workbook.reorder_worksheets(worksheets_obj_list)

Sorting is completed by passing the created list to the library.

Finally

Official reference It's hard to understand ... Pien.

Recommended Posts

[Python] Sort spreadsheet worksheets by sheet name with gspread
Python sort cheat sheet
Try to make BOT by linking spreadsheet and Slack with python 2/2 (python + gspread + slackbot)
Try to make BOT by linking spreadsheet and Slack with python 1/2 (python + gspread + slackbot)
[Python] Filter spreadsheets with gspread
Sort huge files with python
Sort by date in python
[Python] One-liner Stalin sort with 50 characters
(Small story) Sort columns by column name with one liner in pandas.DataFrame
Get property information by scraping with python
Save video frame by frame with Python OpenCV
[Python] Get the variable name with str
Organize data divided by folder with Python
[Python] Sort
Python # sort
I tried searching for files under the folder with Python by file name
Stock number ranking by Qiita tag with python
Algorithm learned with Python 16th: Sorting (insertion sort)
Get git branch name and tag name with python
Read Excel name / cell range with Python VBA
Algorithm learned with Python 15th: Sorting (selection sort)
[Python] Sort the table by sort_values (pandas DataFrame)
Algorithm learned with Python 17th: Sorting (bubble sort)
Extract zip with Python (Japanese file name support)
[Python] How to sort instances by instance variables
Read line by line from a file with Python
Python> Sort by number and sort by alphabet> Use sorted ()