[PYTHON] Use Cloud Datastore from Compute Engine

Thing you want to do

I've researched and discussed various things when I wanted to access the Datastore from Compute Engine, so I'll keep a record of it.

Choices

For the time being, there are roughly two options:

  1. Access with App Engine in between --Implement an API on the App Engine side that accesses the Datastore when a request is received, performs necessary processing, and returns data. --Hit the API from Compute Engine
  2. Hit Cloud Datastore API directly --This time, we will use a client library called gcloud-py.

Even though there is Cloud Datastore API, why not put the App Engine in between? It seems to be thought. However, due to circumstances such as the Cloud Datastore API is not good and the access from App Engine to Datastore is fast, after all, App Engine is in between. It was better to pinch it or something like that.

But that is also an old story. I don't know what's going on now! So I tried various things.

Let's try!

I tried to compare by a simple process of fetching one Entity by specifying the Key. Since it is not measured accurately, please think that each value may deviate by about 20 to 30 ms.

When I tried it from my Macbook Pro as a starting point, the results were as follows.

Cloud Datastore API Via App Engine Via App Engine(With memcache)
Local About 1000 ms About 200 ms About 170 ms

At this point, I was completely despaired that the Cloud Datastore API was no longer good, but my senior at the company next to me said "I'm sorry!", So I went to the US region [^ 1] to Compute. I built an instance of Engine and tried my best. The result is here.

[^ 1]: Datastore lives in the US and EU, so let's build an instance nearby

Cloud Datastore API Via App Engine Via App Engine(With memcache)
GCE (US) 50~About 200 ms 45~About 50 ms 15~About 20 ms

The time is short and it's getting harder to measure, so it's a little sloppy, but I feel that the speed of the Cloud Datastore API has somehow become acceptable.

However, the speed of Cloud Datastore API is not stable, and when it is fast, it is as fast as via App Engine, but when it is slow, it may take about 200 ms, so I do not understand the reason for that. It was. Also, when using App Engine, memcache is fast because it saves almost all the time (about 30 ms) to access the Datastore from App Engine.

Advantages and disadvantages of each

With App Engine in between

merit

If you have a lot of cacheable requests, memcache can do a lot of work. It took about 30ms to get the data from App Engine, but it took only 2ms to get it from the cache. It is also advantageous that there is no read operation fee.

Demerit

If you want to batch process a large amount of data in parallel, it will cost you a little because you will grow an instance of App Engine.

Hit the Cloud Datastore API directly

merit

It's easy to implement, and of course there's no cost for an App Engine instance. Accessing from Compute Engine may be as fast as going through App Engine.

Demerit

There is no Memcache. gcloud-py Documentation is nothing enough to think that it is the beginning of the universe [^ 2].

[^ 2]: If you look closely, most of the items are links to the source code.

Personal conclusion

If you just think about speed, it's a little faster to go through App Engine for now, but in some situations it seems that hitting the Cloud Datastore API directly from Compute Engine is also an option. ..

If memcache does not work such as inputting data in batch processing, it seems easier to implement and less expensive to hit Cloud Datastore directly.

bonus

Installation of gcloud-python is basically

pip install gcloud

That's fine, but there seems to be a newer one on the master branch on GitHub.

Looking at code in master branch, if there is gRPC like this, use it A new description like this was added [^ 3].

[^ 3]: As of 2016-08-28, this feature was not implemented in v0.18.1, which comes in with pip.

スクリーンショット 2016-08-28 17.45.22.png スクリーンショット 2016-08-28 17.53.15.png

so

pip install git+https://github.com/GoogleCloudPlatform/gcloud-python

I also tried gcloud-python, which was installed directly from the master branch on GitHub, and found that it was up to 20 ms faster, and it was a pretty good match with via App Engine.

I felt that the speed was not very stable as usual, but I feel that I am trying hard to improve the performance, so I am expecting it.

Recommended Posts

Use Cloud Datastore from Compute Engine
Access Cloud Storage from your Compute Engine instance
Run Cloud Dataflow (Python) from App Engine
Use Google Cloud Vision API from Python
Firebase: Use Cloud Firestore and Cloud Storage from Python
Use thingsspeak from python
Use fluentd from python
Use MySQL from Python
Use BigQuery from python.
Use mecab-ipadic-neologd from python
Let's use Watson from Python! --How to use Developer Cloud Python SDK
Use MySQL from Anaconda (python)
Use django model from interpreter
Use e-Stat API from Python
Play with GCP free frame ② ~ Airflow (on Compute Engine), Cloud Functions ~