[PYTHON] Use tse to expose unposted Qiita Advent Calendar

Introduction

This article is the 8th day article of Crawler / Web Scraping Advent Calendar 2016. I'm sorry I was late orz

What to do here

I'll put it on the shelf if I'm late and check if there are any unposted posts on other Advent Calendars.

Data from Qiita using Python text formatting tools tse and beautifulsoup4 Extract.

What is tse?

For details, you can understand by looking at the slides presented by Mr. Ishimoto at PyCon JP 2015. tse --Text Formatting Utility with Python

Example of tse used this time

-ms-import with <module name import * format> -b --Action to take immediately after startup {{}} --Specify the block to indent

environment

Python 2.7.10

Actually try

Category list

$ curl http://qiita.com/advent-calendar/2016 | tse -ms 'bs4' -b 'for a in BeautifulSoup(sys.stdin.read(), "lxml").find_all("a", class_="adventCalendarCard_block_showAll"):{{print(a.get("href"))}}' > categories.txt

categories.txt


/advent-calendar/2016/categories/to_be_decided
/advent-calendar/2016/categories/programming_languages
/advent-calendar/2016/categories/libraries
/advent-calendar/2016/categories/databases
/advent-calendar/2016/categories/web_technologies
/advent-calendar/2016/categories/mobile
/advent-calendar/2016/categories/devops
/advent-calendar/2016/categories/iot
/advent-calendar/2016/categories/os
/advent-calendar/2016/categories/editors
/advent-calendar/2016/categories/academic
/advent-calendar/2016/categories/services
/advent-calendar/2016/categories/company
/advent-calendar/2016/categories/miscellaneous

Calendar list

$ cat categories.txt | while read line; do curl http://qiita.com$line | tse -ms 'bs4' -b 'for td in BeautifulSoup(sys.stdin.read(), "lxml").find_all("td", class_="adventCalendarList_calendarTitle"):{{print(td.find_all("a")[-1].get("href"))}}' >> calendars.txt; done

calendars.txt


cat calendars.txt
/advent-calendar/2016/chat-bot
/advent-calendar/2016/idd
/advent-calendar/2016/enough
/advent-calendar/2016/rite
/advent-calendar/2016/befool
~
/advent-calendar/2016/bookthankyou
/advent-calendar/2016/muscle
/advent-calendar/2016/ue4_inside
/advent-calendar/2016/job
/advent-calendar/2016/job2

List of unposted calendars

$ cat calendars.txt | while read line; do curl http://qiita.com$line | tse -ms 'bs4' -b 'bs=BeautifulSoup(sys.stdin.read(), "lxml"){{}}if bs.find("div", class_="adventCalendarItem_cancelButton"):print("[" + bs.title.string.split("- Qiita")[0].strip().encode("utf-8") + "](http://qiita.com"+bs.find(id="redirect_path").get("value")+")")' >> not_post_calendars.md; done

Ritual Advent Calendar 2016 Dart Advent Calendar 2016 F# Advent Calendar 2016 Java Puzzlers Advent Calendar 2016 JavaScript Advent Calendar 2016 Lisp Advent Calendar 2016 Perl 5 Advent Calendar 2016 PHP Advent Calendar 2016 Advent Calendar 2016 to make something with PHP Advent Calendar 2016 to defeat Micra-style FPS AOS with Python Smalltalk Advent Calendar 2016 Clojure Advent Calendar 2016 D language Advent Calendar 2016 Go Advent Calendar 2016 Go (Part 3) Advent Calendar 2016 Haskell Advent Calendar 2016 mruby Advent Calendar 2016 R Advent Calendar 2016 Rust Part 2 Advent Calendar 2016 Scala Advent Calendar 2016 Swift Part 2 Advent Calendar 2016 One person PHP review Advent Calendar 2016 A-Frame Advent Calendar 2016 Django Advent Calendar 2016 Docker2 Advent Calendar 2016 Grimoire.js Advent Calendar 2016 GStreamer Advent Calendar 2016 phina.js Advent Calendar 2016 React Native Advent Calendar 2016 ROS Advent Calendar 2016 RxJava Advent Calendar 2016 RxSwift Advent Calendar 2016 Angular Advent Calendar 2016 Laravel Advent Calendar 2016 Ruby on Rails Advent Calendar 2016 Unity Assets Advent Calendar 2016 Vue.js Advent Calendar 2016 Firebird Advent Calendar 2016 MySQL Casual Advent Calendar 2016 eZ Publish / eZ Platform Advent Calendar 2016 jupyter notebook Advent Calendar 2016 Markdown Advent Calendar 2016 Microservices Advent Calendar 2016 Website Performance Advent Calendar 2016 Windows Device Portal Advent Calendar 2016 Advent Calendar 2016 Frontend Engineer Advent Calendar 2016 CSS Advent Calendar 2016 TECHSCORE Advent Calendar 2016 Advent Calendar 2016 Crawler / Web Scraping Advent Calendar 2016 One person PostCSS Advent Calendar 2016 Appcelerator Titanium Advent Calendar 2016 IBM MobileFirst Advent Calendar 2016 Android 2 Advent Calendar 2016 iOS Advent Calendar 2016 iOS 2 Advent Calendar 2016 iOS Part 3 Advent Calendar 2016 Xamarin Advent Calendar 2016 PowerShell Advent Calendar 2016 SORACOM Advent Calendar 2016 JavaScript Robotics Advent Calendar 2016 myThings Advent Calendar 2016 Development of baby monitoring system by Raspberry Pi Advent Calendar 2016 Windows 10 IoT Core Advent Calendar 2016 NetBSD Advent Calendar 2016 Solaris Advent Calendar 2016 Linux Advent Calendar 2016 Emacs Advent Calendar 2016 Visual Studio Code Advent Calendar 2016 Overlay Network Advent Calendar 2016 Anything Security Advent Calendar 2016 Introduction of Human-Computer Interaction Papers Advent Calendar 2016 Machine Learning Advent Calendar 2016 High school math redo advent calendar required for machine learning Advent Calendar 2016 Language implementation Advent Calendar 2016 DevRel Advent Calendar 2016 DIVE INTO CODE Advent Calendar 2016 GitLab Advent Calendar 2016 IDCF Cloud Advent Calendar 2016 Google Cloud Platform(1) Advent Calendar 2016 Kubernetes Advent Calendar 2016 Microsoft Azure Advent Calendar 2016 Fucking app Advent Calendar 2016 BRIGHT VIE Engineer Advent Calendar 2016 dotstudio Advent Calendar 2016 Engraphia Advent Calendar 2016 Fablic Advent Calendar 2016 Fujitsu extended Advent Calendar 2016 Hortonworks Advent Calendar 2016 ids-info (Faculty of Liberal Arts, Department of Interdisciplinary Sciences, University of Tokyo) Advent Calendar 2016 iRidge Advent Calendar 2016 KAYAC Advent Calendar 2016 KOIL CSC Advent Calendar 2016 Morning Project Samurai Advent Calendar 2016 N High Advent Calendar 2016 Okinawa.go Advent Calendar 2016 Okinawa.rb Advent Calendar 2016 Perl Entrance Ceremony Advent Calendar 2016 Scoville Engineers Advent Calendar 2016 Seattle Consulting Advent Calendar 2016 TokyoSW Advent Calendar 2016 Wondershake Advent Calendar 2016 Aratana Advent Calendar 2016 Lancers Advent Calendar 2016 Information Science College Advent Calendar 2016 Mathematical System Advent Calendar 2016 Japan Information Create Engineers Advent Calendar 2016 Tokyo University of Science Advent Calendar 2016 Atrae Advent Calendar 2016 eureka Advent Calendar 2016 Goodpatch Advent Calendar 2016 HAL Advent Calendar 2016 IT Study Group / Community Management Advent Calendar 2016 mixi Group Advent Calendar 2016 Opt Technologies Advent Calendar 2016 Sansan Advent Calendar 2016 Yahoo! JAPAN Tech Advent Calendar 2016 Interprism members write articles that are useful for their daily work! Advent Calendar 2016 Future Architect Advent Calendar 2016 Introduction to game programs with Scratch and Dart Advent Calendar 2016 CloudAnalytics Advent Calendar 2016 Hunachi^shoujin0^algorithm0 Advent Calendar 2016 Product Manager Advent Calendar 2016 Product Owner Advent Calendar 2016 Stream Processing Advent Calendar 2016 Student Advent Calendar 2016 test Advent Calendar 2016 Raytracing Advent Calendar 2016 Knowledge when I was a subcontractor COBOLER Advent Calendar 2016 Individual developer Advent Calendar 2016 Visualization / Data Visualization Advent Calendar 2016 Emoji / Emoji Advent Calendar 2016 acairojuni Advent Calendar 2016 Aizu Advent Calendar 2016 FLASHer Advent Calendar 2016 HoloLens Advent Calendar 2016 Serverless(2) Advent Calendar 2016 Chatbot Advent Calendar 2016 One person VR calendar Advent Calendar 2016 Poem Advent Calendar 2016 Technical book dedication thanks Advent Calendar 2016

in conclusion

all Not posted
509 141

The result looks like this.

At the moment it is 28% of the total. Let's work hard to reduce the percentage of unposted calendars by making full use of posts and proxy posts at a later date so that 2016 can be finished comfortably!

Recommended Posts

Use tse to expose unposted Qiita Advent Calendar
I checked the calendar deleted in Qiita Advent Calendar 2016
Lazy advent calendar 2019
Looking back on the transition of the Qiita Advent calendar