[PYTHON] Save the results of crawling with Scrapy to the Google Data Store

How to save the information crawled in Scrapy to Google Data Store. At that time, there were some pitfalls, so I summarized them.

Thing you want to do

[Troublesome point 1] Around the authority of Google Cloud Platform

gcloud provides an auth command for authentication. https://cloud.google.com/sdk/gcloud/reference/auth/ However, you can't run this command in scrapy cloud.

Therefore, authenticate using the service account key json. You can download the json file by setting it on the screen below.

Screenshot from 2017-03-14 00-50-21.png

[Trouble point 2] Specify the json path in the environment variable

By writing like this, you can operate the crawler locally.

pipeline.py


from google.cloud import datastore
import os
import time
from threading import Lock


class HogePipeline(object):
    def __init__(self):
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.join(os.path.dirname(__file__), "./hogehogehoge.json")
        self.g_client = datastore.Client('hoge-project')

    def process_item(self, item, spider):
        # put 
        return item

[Troublesome point 3] Deploy with the json file included

MANIFEST.ini


include path/to/hogehogehoge.json

setup.py



from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = hoge.settings']},
    install_requires = [],
    include_package_data = True
)

Deployment commands

$ python setup.py bdist_egg
$ shub deploy --egg dist/project-1.0-py2.7.egg

Recommended Posts

Save the results of crawling with Scrapy to the Google Data Store
I tried to save the data with discord
Save in Japanese to StringProperty in Google App Engine data store
Try to extract the features of the sensor data with CNN
Try to image the elevation data of the Geographical Survey Institute with Python
Write the result of keyword search with ebaysdk to Google Spread Sheets
Convert data with shape (number of data, 1) to (number of data,) with numpy.
Save data to flash with STM32 Nucleo Board
Save the object to a file with pickle
Save the search results on Twitter to CSV.
[Introduction to Python] How to get the index of data with a for statement
Add information to the bottom of the figure with Matplotlib
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Try to get the contents of Word with Golang
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I just wanted to extract the data of the desired date and time with Django
Extract the band information of raster data with python
I tried to display the point cloud data DB of Shizuoka prefecture with Vue + Leaflet
[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬
I tried to analyze the data of the soccer FIFA World Cup Russia tournament with soccer action
Return the image data with Flask of Python and draw it to the canvas element of HTML
How to insert a specific process at the start and end of spider with scrapy
I tried to find the entropy of the image with python
Try scraping the data of COVID-19 in Tokyo with Python
Save the output of GAN one by one ~ With the implementation of GAN by PyTorch ~
A network diagram was created with the data of COVID-19.
I tried to find the average of the sequence with TensorFlow
Visualize the results of decision trees performed with Python scikit-learn
[Part.2] Crawling with Python! Click the web page to move!
Settings to debug the contents of the library with VS Code
Data analysis based on the election results of the Tokyo Governor's election (2020)
How to summarize the results of FreeSurfer ~ aparc, aseg, wmparc ~
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
Try to automate the operation of network devices with Python
The story of copying data from S3 to Google's TeamDrive
Save images on the web to Drive with Python (Colab)
Django Changed to save lots of data in one go
[Introduction to Python] How to get data with the listdir function
Get the source of the page to load infinitely with python.
I sent the data of Raspberry Pi to GCP (free)
Reuse the results of clustering
Save tweet data with Django
How to extract features of time series data with PySpark Basics
The story of not being able to run pygame with pycharm
Save the result of the life game as a gif with python
Become familiar with (want to be) around the pipeline of spaCy
I tried to automate the watering of the planter with Raspberry Pi
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
[Machine learning] Check the performance of the classifier with handwritten character data
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
Try to solve the N Queens problem with SA of PyQUBO
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
Correspondence analysis of sentences with COTOHA API and save to file
Consider the speed of processing to shift the image buffer with numpy.ndarray
Solving the Maze with Python-Supplement to Chapter 6 of the Algorithm Quick Reference-
When you want to save the result of the callback function somewhere
How to monitor the execution status of sqlldr with the pv command
I tried to expand the size of the logical volume with LVM
The strongest way to use MeCab and CaboCha with Google Colab