[PYTHON] Get a list of Qiita likes by scraping

Introduction

I'll write it first, but I'm not good at writing sentences, so if it's hard to read, I'm sorry. Well, I'm very upset to know that I can't see the list of likes because I used likes instead of stock. It seems that many people are worried about twitter and Qiita. [^ 1] [^ 2] So, scrape with python to get a list of likes.

Introduction

After installing python3, use the pip command to request, BeautifulSoup4, progressbar2 Please install

code

Get the article list and save it in the results.json file.

# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import requests
from time import sleep
import json
from progressbar import ProgressBar

#Login to Qiita
payload = {
    'utf8': '✓',
    'identity': 'lovemuffim114',  #username
    'password': 'tkhr2783'  #password
}

# authenticity_Get token
s = requests.Session()
r = s.get('https://qiita.com')
soup = BeautifulSoup(r.text, "html.parser")
auth_token = soup.find(attrs={'name': 'authenticity_token'}).get('value')
payload['authenticity_token'] = auth_token

#Login
s.post('https://qiita.com/login', data=payload)

#Dictionary for saving results
results = dict()

searchId = 1000000
param = {"before": searchId, "type": "id"}
#Get the number of articles
maxId = s.get("http://qiita.com/api/public", params=param).json()[0]["id"]
bar = ProgressBar(min_value=0, max_value=maxId)  #Set maximum value

try:  #Error when there are no more articles to load
    while True:
        param = {"before": searchId, "type": "id"}
        #Get article list
        j = s.get("http://qiita.com/api/public", params=param).json()

        #Add articles that you like and do not stock to results.
        # "url"other than,"uuid"Can also be specified.
        results.update({i["title"]: i["url"] for i in j if i["liked"] and not i["stocked"]})
        
        searchId = j[-1]["id"]
        bar.update(maxId - searchId)  #Progress bar update
        sleep(1)  # sleep(1 second)You can use a decimal point.
finally:
    print(results)

    # results.Save to json
    with open("results.json", "a") as f:
        json.dump(results, f, ensure_ascii=False, indent=4, separators=(',', ': '))

that's all. Result is,

{
    "[Narou4j] I made a novel acquisition library to become a novelist in Java": "863aa22a29db16463e52",
    "Accelerate ListView without using ViewHolder": "28f8be64d39b20e69552",
    "[Narou4j] Created Java wrapper library for Narurou API": "6c050593f45174056005",
    "I made my own hydroponics set.": "5d60c14d560ecf518a4e",
    "RxJava + Flux (+ Kotlin)Android app design by": "cbf304891daec87ba5b7",
    "I made EventBus with RxJava": "a4ece37834446c9a39c8"
}

It will be saved as. For logging in to Qiita, I used Login to website in Python.

Annotation

[^ 1]: [[Caution! ] Articles will not be stocked if you just press the "Like" button on Qiita! I wrote an extension! ]](Http://qiita.com/gimupop/items/be53044143a9a3e90a4b) [^ 2]: [Twitter search results for "qiita Likes-! -Breakthrough"](https://twitter.com/search?f=tweets&vertical=default&q=qiita%20%E3%81%84%E3% 81% 84% E3% 81% AD% E3% 80% 80% E4% B8% 80% E8% A6% A7% E3% 80% 80% 20-% EF% BC% 81% 20-% E7% AA% 81% E7% A0% B4 & src = typd)

Recommended Posts

Get a list of Qiita likes by scraping
Get a list of articles posted by users with Python 3 Qiita API v2
[python] Get a list of instance variables
[Python] Get a list of folders only
Get a list of IAM users with Boto3
Add a list of numpy library functions little by little --a
Get a list created by a user other than yourself
Python: Get a list of methods for an object
Group by consecutive elements of a list in Python
[Python] How to make a list of character strings character by character
[Command] Command to get a list of files containing double-byte characters
Add a list of numpy library functions little by little --c
Get the number of specific elements in a python list
Get a list of purchased DMM eBooks with Python + Selenium
Since Python 1.5 of Discord, I can't get a list of members
How to get a list of built-in exceptions in python
How to get a list of links from a page from wikipedia
Python> Get a list of files in multiple directories> Use glob | Sort by modification time
Get Splunk download link by scraping
Get the number of views of Qiita
List of packages installed by conda
Get Qiita trends with Python scraping
Generate a list of consecutive characters
Nogizaka46 Get blog images by scraping
Collect only facial images of a specific person by web scraping
I tried to get a list of AMI Names using Boto3
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Get a list of files in a folder with python without a path
For Windows: Get a list of directories and files under a specific directory.
Get property information by scraping with python
Get a list of packages installed in your current environment with python
Get the column list & data list of CASTable
[Linux] Command to get a list of commands executed in the past
String conversion of a list containing numbers
Get a domain owned by a specific organization
Get the number of articles accessed and likes with Qiita API + Python
Get the filename of a directory (glob)
Python script to get a list of input examples for the AtCoder contest
How to get a list of files in the same directory with python
Get a lot of Twitter tweets at once
A set of integers that satisfies ax + by = 1.
Get iPad maintenance by scraping and notify Slack
[python] Create a list of various character types
I tried to get an image by scraping
Display output of a list of floating point numbers
Make a copy of the list in Python
Get only the subclass elements in a list
A verification of AWS SDK performance by language
Get boat race match information by web scraping
Get a glimpse of machine learning in Python
Search by the value of the instance in the list
Visualization of Produce 101 Japan trainee ranking by scraping
After hitting the Qiita API with Python to get a list of articles for beginners, we will visit the god articles
Get a list of GA accounts, properties, and views as vertical data using API
I quarantined my environment with virtualenv, but I get a lot of packages with pip list
[Qiita API] Get Views, Likes, Stocks by various methods (JavaScript, Google Script, Python, Vue.js)
[Linux] A list of Linux commands that beginners should know
Get the variable name of the variable as a character string.
Convert a slice object to a list of index numbers
A list of stumbling blocks in Django's image upload