[PYTHON] How to run the Export function of GCP Datastore automatically

This time, I will introduce how to make an automatic backup of Google Cloud Platform Datastore.

Premise

--Something exists in Datastore

procedure

1, Make a bucket for backing up with Storage

Go to GoogleCloudPlatformStorage (https://console.cloud.google.com/projectselector2/storage/browser?supportedpurview=project) and select your project スクリーンショット 2019-11-26 9.20.37.png

Click Create Bucket to create a bucket スクリーンショット 2019-11-26 14.17.00.png

--Put a name in the bucket --Let's enter the name in lowercase alphanumeric characters (this time we will do it with backup_qiita) --Selection of data storage location --Region: Specific geographical location --Multi-region: A large geographic location that includes two or more geographic locations such as the United States --Dual-region: Specific geographic location indicating a pair, such as Finland or the Netherlands

[Click here for details](https://cloud.google.com/storage/docs/locations?_ga=2.5981299.-176719181.1568084299) --Choose a default storage class for your data --Standard: Ideal for short-term storage and frequently accessed data --Nearline: Ideal for data and backups that are accessed less than once a month --Coldline: Ideal for data and backups that are accessed less than once a year --Choose how to control access to objects --Detailed management --Uniform --Detailed settings (optional)

Press [Create] when the settings are complete. スクリーンショット 2019-11-26 15.13.57.png A bucket for backups has been created like this. Next, set up to put the backup in the bucket

2. Create the source code that receives the Datastore entity and backs it up.

Here, https://cloud.google.com/datastore/docs/schedule-export?hl=ja I will add it with Yaml and Python while referring to it. First, create ** [app.yaml] ** in Yaml. (If it already exists, create a folder and create it in it) Screenshot 2019-11-26 15.33.34.png ← I put it in the backup folder like this

The app.yaml here is prepared for Python settings.

app.yaml


runtime: python27
api_version: 1
threadsafe: true
service: cloud-datastore-admin

libraries:
- name: webapp2
  version: "latest"

handlers:
- url: /cloud-datastore-export
  script: cloud_datastore_admin.app
  login: admin

Next, let's prepare ** [cloud_datastore_admin.py] ** in Python.

cloud_datastore_admin.py


import datetime
import httplib
import json
import logging
import webapp2

from google.appengine.api import app_identity
from google.appengine.api import urlfetch

class Export(webapp2.RequestHandler):

  def get(self):
    access_token, _ = app_identity.get_access_token(
        'https://www.googleapis.com/auth/datastore')
    app_id = app_identity.get_application_id()
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')

    output_url_prefix = self.request.get('output_url_prefix')
    assert output_url_prefix and output_url_prefix.startswith('gs://')
    if '/' not in output_url_prefix[5:]:
      # Only a bucket name has been provided - no prefix or trailing slash
      output_url_prefix += '/' + timestamp
    else:
      output_url_prefix += timestamp

    entity_filter = {
        'kinds': self.request.get_all('kind'),
        'namespace_ids': self.request.get_all('namespace_id')
    }
    request = {
        'project_id': app_id,
        'output_url_prefix': output_url_prefix,
        'entity_filter': entity_filter
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer ' + access_token
    }
    url = 'https://datastore.googleapis.com/v1/projects/%s:export' % app_id
    try:
      result = urlfetch.fetch(
          url=url,
          payload=json.dumps(request),
          method=urlfetch.POST,
          deadline=60,
          headers=headers)
      if result.status_code == httplib.OK:
        logging.info(result.content)
      elif result.status_code >= 500:
        logging.error(result.content)
      else:
        logging.warning(result.content)
      self.response.status_int = result.status_code
    except urlfetch.Error:
      logging.exception('Failed to initiate export.')
      self.response.status_int = httplib.INTERNAL_SERVER_ERROR

app = webapp2.WSGIApplication(
    [
        ('/cloud-datastore-export', Export),
    ], debug=True)

If you can input like this, let's deploy

  1. Use [config set] to configure gcloud for the right project
gcloud config set project [Project name]
  1. Deploy [app.yaml] to the project * Do not mistake the directory where [app.yaml] is located! !!
gcloud app deploy

If you can deploy it successfully, you are ready for automatic backup. We will automate it next.

3. Set to run automatically in cron job

See also https://cloud.google.com/datastore/docs/schedule-export?hl=ja I will proceed to

First to set up a cron job that calls the GCP schedule-datastore-exports Create ** [cron.yaml] **.

cron.yaml


cron:
- description: "Explanatory text"
  url: /cloud-datastore-export?output_url_prefix=gs://[Bucket name]
  target: cloud-datastore-admin
  timezone: Asia/Tokyo
  schedule: every 12 hours


If this is okay, let's deploy

gcloud app deploy cron.yaml

For detailed settings ~ Selection ~

Use kind if you want to export only certain types of entities when backing up

url: /cloud-datastore-export?output_url_prefix=gs://[Bucket name]&kind=[Entity type name]

It is also possible to specify multiple specific ones

url: /cloud-datastore-export?output_url_prefix=gs://[Bucket name]&kind=[Entity type name]&kind=[Entity type name]

If you want to export if the namespace exists for a particular entity

url: /cloud-datastore-export?output_url_prefix=gs://[Bucket name]&kind=[Entity type name]&kind=[Entity type name]

You can also make detailed settings by entering

For detailed settings ~ Time setting ~

Let's specify the time to execute the backup in ** [schedule] ** Currently, it is set to perform backup once every 12 hours, but if it is once a day

schedule: every 24 hours

If you want to perform a backup every day at 00:00

schedule: every day 00:00

Can be set widely For more information [https://cloud.google.com/appengine/docs/flexible/java/scheduling-jobs-with-cron-yaml?hl=ja#defining_the_cron_job_schedule](https://cloud.google.com/appengine/ docs / flexible / java / scheduling-jobs-with-cron-yaml? hl = ja #defining_the_cron_job_schedule) for details. By the way, if you want to set the reference time for backup to Japan time

timezone: Asia/Tokyo

If so, it will be done in Japan time.

4, implementation of cron job

Let's test if it actually works

If the App Engine cron job is displayed like this, cron is working well

スクリーンショット 2019-11-27 10.48.33.png

Now let's move it with [Run Now] Now!

Wow, Elatter (´ ゚ д ゚ `) スクリーンショット 2019-11-27 10.49.27.png In such a case, let's see the error in [View] of the log.

See the error in the log

When I displayed the log, it was 403. スクリーンショット 2019-11-27 10.49.43.png

When you open it, it says "The caller does not have permission" ... In other words, "I don't have the authority to import and export." スクリーンショット 2019-11-27 10.50.26.png

Let's go give authority

Give authority

Open IAM in IAN and admin スクリーンショット 2019-11-27 11.14.34.png Edit [Project ID]@appspot.gserviceaccount.com in it スクリーンショット 2019-11-27 11.22.26.png Press [Add another role] Select [Cloud Datastore Import / Export Administrator] in Datastore and save! スクリーンショット 2019-11-27 11.22.58.png

So let's go back to running again スクリーンショット 2019-11-27 10.49.27.png If you press [Run Now] again ... スクリーンショット 2019-11-27 11.24.49.png It's done! !!

Now it will take the data every 12 hours!

Finally

How was everyone, did it work? I'm wondering if this commentary went well (´ ゚ д ゚ `)

But if you back up the data in the Datastore like this You can erase it by human error or deal with any troubles. I don't think there is any loss in doing it

It's been a long time, but thank you! !!

Reference URL

https://cloud.google.com/storage/docs/locations?_ga=2.5981299.-176719181.1568084299

https://cloud.google.com/datastore/docs/schedule-export?hl=ja

https://cloud.google.com/appengine/docs/flexible/java/scheduling-jobs-with-cron-yaml?hl=ja#defining_the_cron_job_schedule

Recommended Posts