[PYTHON] [Scrapy] Only endpoints do not crawl and patrol

I started implementing it with Scrapy because I wanted to be able to start from the endpoint instead of just crawling the endpoint, but I'm addicted to it, so make a note of it.

I implemented scraping by referring to the contents around here. http://qiita.com/meltest/items/b445510f09d81276a420 http://qiita.com/checkpoint/items/0c8ad814c25e85bbcfa2#_reference-2f452f48c4e974829586 http://qiita.com/tamonoki/items/ce58ff209f8eae808162 http://web-tsukuru.com/570

Status

I tried to implement scraping rules by imitating the above site, but for some reason only the endpoints were crawled.

    #Scraping rule settings
    rules = (
             #Specify rules for scraping URLs
             Rule(LinkExtractor(deny=deny_list,unique=True), callback='parse'),
             #Specify the URL that spider will follow
             Rule(LinkExtractor(), follow=True)
            )

    def parse(self, response:

Cause

It seems that there was a problem with the function name (parse) read in Callback. Maybe it's written below? I can't read English. https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.parse

Correspondence

Just change the function name and it will scrape from the endpoint in order.

    #Scraping rule settings
    rules = (
             #Specify rules for scraping URLs
             Rule(LinkExtractor(deny=deny_list,unique=True), callback='downloadPic'),
             #Specify the URL that spider will follow
             Rule(LinkExtractor(), follow=True)
            )

    def downloadPic(self, response):

Recommended Posts

[Scrapy] Only endpoints do not crawl and patrol
[Scrapy] Only endpoints do not crawl and patrol
Do not omit __init__.py
The websocket of toio (nodejs) and python / websocket do not connect.