I started implementing it with Scrapy because I wanted to be able to start from the endpoint instead of just crawling the endpoint, but I'm addicted to it, so make a note of it.
I implemented scraping by referring to the contents around here. http://qiita.com/meltest/items/b445510f09d81276a420 http://qiita.com/checkpoint/items/0c8ad814c25e85bbcfa2#_reference-2f452f48c4e974829586 http://qiita.com/tamonoki/items/ce58ff209f8eae808162 http://web-tsukuru.com/570
I tried to implement scraping rules by imitating the above site, but for some reason only the endpoints were crawled.
#Scraping rule settings
rules = (
#Specify rules for scraping URLs
Rule(LinkExtractor(deny=deny_list,unique=True), callback='parse'),
#Specify the URL that spider will follow
Rule(LinkExtractor(), follow=True)
)
def parse(self, response:
It seems that there was a problem with the function name (parse) read in Callback. Maybe it's written below? I can't read English. https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.parse
Just change the function name and it will scrape from the endpoint in order.
#Scraping rule settings
rules = (
#Specify rules for scraping URLs
Rule(LinkExtractor(deny=deny_list,unique=True), callback='downloadPic'),
#Specify the URL that spider will follow
Rule(LinkExtractor(), follow=True)
)
def downloadPic(self, response):