[PYTHON] Until you start crawling in Scrapy

What is Scrapy

Scrapy is Python's crawling and scraping framework. By using this, you can code according to the framework's method instead of importing the library into your own code.

Install Scrapy

$pip install scrapy

Create a project

To create a project, run the following command.

$scrapy startproject (Project name)

The project name can be anything you like. If you execute it, you will get a lot of directories.

Set the download interval

If you don't download at intervals, it will put a load on the system you are crawling to, so you need to pay close attention to it.

Add the following statement to setting.py from the project name folder.

DOWNLOAD_DERAY = 1

Create item

It is a place to store what you got from crawling. Define a class in items.py.

class [name of the class](scrapy.Item):
    [The name of what you fetch] = scrapy.Field()

item = [name of the class]()
item['The name of what you fetch'] = 'Examples'

Creating a Spider

The details of crawling and scraping are mainly described in spider. Enter the following command to create a spider.

$scrapy genspider [spider name] [Domain of the site to fetch]

This will create a [spider name] .py file in the spider folder.

After this, spider will be described according to the site where crawling is performed.

I would appreciate it if you could point out any mistakes.

Recommended Posts

Until you start crawling in Scrapy
Until you put Python in Docker
Until you put pyaudio in Mavericks
Until you run the changefinder sample in python
Until you create a new app in Django
Until you insert data into a spreadsheet in Python
Until you install MySQL-python
Until you run server Django in Visual Studio Code
Until you install TensorFlow-GPU with pip in Windows environment
[Gimp] Start scripting in Python
Trade-offs in web scraping & crawling
Write Spider tests in Scrapy
Start in 5 minutes GIMP Python-Fu