How to save the information crawled in Scrapy to Google Data Store. At that time, there were some pitfalls, so I summarized them.
gcloud provides an auth command for authentication. https://cloud.google.com/sdk/gcloud/reference/auth/ However, you can't run this command in scrapy cloud.
Therefore, authenticate using the service account key json. You can download the json file by setting it on the screen below.
By writing like this, you can operate the crawler locally.
pipeline.py
from google.cloud import datastore
import os
import time
from threading import Lock
class HogePipeline(object):
def __init__(self):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.join(os.path.dirname(__file__), "./hogehogehoge.json")
self.g_client = datastore.Client('hoge-project')
def process_item(self, item, spider):
# put
return item
MANIFEST.ini
include path/to/hogehogehoge.json
setup.py
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = hoge.settings']},
install_requires = [],
include_package_data = True
)
Deployment commands
$ python setup.py bdist_egg
$ shub deploy --egg dist/project-1.0-py2.7.egg
Recommended Posts