I tried scraping the ranking of Qiita Advent Calendar with Python

Qiita's Advent Calender has a public ranking. Overall calendar ranking

At first, I took a look at it for the purpose of finding an interesting calendar, but when I looked closely, we were also ranked! : clap :: clap ::

image.png

However, even if the number of likes is 1, it is ranked, so it is natural.

It was the third day for the first time since I knew that it was ranked ** "What is the ranking today?" **. This is a hassle ...

I decided to automate the ranking acquisition, saying, "Since Qiita has an API, you can get the ranking with it."

Scraping if you don't have an API!

Qiita has an API. Here → Qiita API v2 specifications However, even if you look at the documentation, there is no Advent Calender API: sob:

If there is no API, I decided to scrape with Python. When I searched for Python scraping, I found many Beautiful Soup, so I decided to use ** Beautiful Soup **.

Scraping is easy

The purpose is clear. To get the ranking of your company. I looked up the HTML tags.

スクリーンショット 2019-12-12 19.04.29.png

To get the ranking numbers, go up two from the calendar link <a class="adventCalendarRankingListItem_calendarName"> and get the text.

I tried to make a code using Beautiful Soup.

from urllib import request
from bs4 import BeautifulSoup

targethref = '/advent-calendar/2019/fork'

def main():

    url = 'https://qiita.com/advent-calendar/2019/ranking/feedbacks/all'
    targetclass = 'adventCalendarRankingListItem_calendarName'

    response = request.urlopen(url)
    soup = BeautifulSoup(response,features="html.parser")
    ranking = soup.find('a',class_=targetclass,href=targethref).parent.parent.contents[0].text
    response.close()

    print(ranking)


if __name__ == "__main__":
    main()

That's it, it's that easy! ** BeautifulSoup ** Great! (It's my feeling because I don't know other libraries that can be scraped)

in conclusion

I run this logic on AWS Lambda once a day. The result of the execution was thrown to the in-house Chat tool to share the information.

It was a day when I realized that I could scrape even if the API wasn't published: sunny:


:fork_and_knife: FORK Advent Calendar 2019 The Advent Calender article I wrote is here

Recommended Posts

I tried scraping the ranking of Qiita Advent Calendar with Python
I tried scraping with python
I tried to get the authentication code of Qiita API with Python.
I tried web scraping with python.
I tried to find the entropy of the image with python
I tried "gamma correction" of the image with Python + OpenCV
I tried to put out the frequent word ranking of LINE talk with Python
I tried scraping Yahoo News with Python
I tried to improve the efficiency of daily work with Python
I tried "smoothing" the image with Python + OpenCV
I tried hundreds of millions of SQLite with python
I tried "differentiating" the image with Python + OpenCV
I tried "binarizing" the image with Python + OpenCV
I checked the calendar deleted in Qiita Advent Calendar 2016
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
I tried fp-growth with python
I tried to touch the CSV file with Python
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried to solve the soma cube with python
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
Looking back on the transition of the Qiita Advent calendar
I tried to solve the problem with Python Vol.1
I tried scraping the advertisement of the pirated cartoon site
I tried gRPC with Python
I tried "morphology conversion" of images with Python + OpenCV
I tried hitting the API with echonest's python client
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to summarize the string operations of Python
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
Try scraping the data of COVID-19 in Tokyo with Python
I tried to simulate how the infection spreads with Python
I tried the accuracy of three Stirling's approximations in python
I tried using the Python library from Ruby with PyCall
I tried to find the average of the sequence with TensorFlow
I wrote the basic grammar of Python with Jupyter Lab
I tried running Movidius NCS with python of Raspberry Pi3
I evaluated the strategy of stock system trading with Python.
[Python] I tried to visualize the follow relationship of Twitter
[Python] I tried collecting data using the API of wikipedia
I tried a stochastic simulation of a bingo game with Python
I tried to divide the file into folders with Python
I liked the tweet with python. ..
Get Qiita trends with Python scraping
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
How to write offline real time I tried to solve the problem of F02 with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I compared the speed of Hash with Topaz, Ruby and Python
I tried to solve the ant book beginner's edition with python
I tried to automate the watering of the planter with Raspberry Pi
I want to output the beginning of the next month with Python
I tried to create a list of prime numbers with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to expand the size of the logical volume with LVM
I tried running the DNN part of OpenPose with Chainer CPU
I tried to automatically collect images of Kanna Hashimoto with Python! !!
PhytoMine-I tried to get the genetic information of plants with Python