Stock number ranking by Qiita tag with python

【Overview】

Extract articles with the top 10 stocks in a certain tag.

【environment】

windows8.1  python3.5

【program】

We have ranked Python tags. Execution method → python stock_rank.py> output.html

stock_rank.py


# -*- coding: utf-8 -*-

import urllib.request
from bs4 import BeautifulSoup

#Initialization of Contribution number
cont = []
for i in range(10):
    cont.append(0)
    
#Title initialization
title = []
for i in range(10):
    title.append("")

page_num = 1

while True:
    try:
        html = urllib.request.urlopen("https://qiita.com/tags/Python/items?page=" + str(page_num)).read()
        
        soup = BeautifulSoup(html, "html.parser")
        
        #HTML extraction by specifying the class
        title_all = soup.find_all(class_="publicItem_body")
        
        # publicItem_Skip pages without body class
        if len(title_all) == 0:
            continue
        
        for i in range(20):
            try:          
                #HTML extraction by specifying the class
                cont_all = soup.find_all(class_="publicItem_stockCount")
                #Remove annoying tags
                cont_sakujo = str(cont_all[i]).replace('<i class="fa fa-stock "></i>','')
                # cont_all_Since after is str type, string property cannot be used
                #Therefore, convert to Beautiful Soup type
                cont_kazu = int(BeautifulSoup(cont_sakujo, "html.parser").string)
                
                for j in range(10):
                    if cont_kazu >= cont[j]:
                        #Contribution number substitution
                        cont.insert(j, cont_kazu)
                        cont.pop()
                        #Title assignment
                        title.insert(j, title_all[i])
                        title.pop()
                        break
                
            #Skip articles that are not stocked by anyone
            except:
                continue
        
        page_num += 1
        
    # HTTP Error 404
    except:
        break

for i in range(len(title)):
    print (str(cont[i]) + " " + str(title[i].a).replace('href="', 'href="http://qiita.com') + "<br>")

【result】

When displaying the encoding with utf-8, garbled characters occurred, so I changed it to shift-jis. rank.png

【problem】

Program execution time is long (> _ <)

[Reference site]

Get information on the net with Python3 + urllib + BeautifulSoup Scraping with Python and Beautiful Soup Scraping with Beautiful Soup

Recommended Posts

Stock number ranking by Qiita tag with python
Recent ranking creation using Qiita API with Python
[Python] Draw a Qiita tag relationship diagram with NetworkX
[Python] Delete by specifying a tag with Beautiful Soup
ABC161D Lunlun Number with python3
Get stock price with Python
Number recognition in images with Python
random French number generator with python
Quine Post with Qiita API (Python)
Prime number generation program by Python
Get Qiita trends with Python scraping
I tried scraping the ranking of Qiita Advent Calendar with Python
Get property information by scraping with python
Try logging in to qiita with Python
Save video frame by frame with Python OpenCV
Download Japanese stock price data with python
Check stock prices with slackbot using python
Web scraping with Python ② (Actually scraping stock sites)
Get the number of articles accessed and likes with Qiita API + Python
Organize data divided by folder with Python
Topic model by LDA with gensim ~ Thinking about user's taste from Qiita tag ~
Get a list of articles posted by users with Python 3 Qiita API v2
[Python] Automatically totals the total number of articles posted by Qiita using the API
[Linux] Qiita Weekly LGTM Number Ranking [Automatic Update]
[Go] Qiita Weekly LGTM Number Ranking [Automatic Update]
Calculate the total number of combinations with python
Get stock price data with Quandl API [Python]
Get git branch name and tag name with python
Read line by line from a file with Python
Let's do web scraping with Python (stock price)
Python> Sort by number and sort by alphabet> Use sorted ()
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Python with Go
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Bingo with python
Zundokokiyoshi with python
Qiita, early Python ♪
Excel with Python
Microcomputer with Python
Cast with python
Get corporate number at once via gbizinfo with python
[First API] Try to get Qiita articles with Python
3 things I noticed by analyzing twitter followers with python
Classify articles with tags specified by Qiita by unsupervised learning
Store the stock price scraped by Python in the DB
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
JPEG image generation by specifying quality with Python + OpenCV
[Python] Get user information and article information with Qiita API
Learn Python asynchronous processing / coroutines by comparing with Node.js
[Python] Sort spreadsheet worksheets by sheet name with gspread
Memo of "Cython-Speeding up Python by fusing with C"
I tried to solve AOJ's number theory with Python