[PYTHON] What I checked about Qiita's post

What I checked about Qiita's post

――What time zone do people posting on Qiita post? ――Which day of the week do you have many posts ...

The story that I wondered and investigated

Survey method and period, etc.

--By web scraping, record the post title, URL, and poster of All Posts (I noticed the existence of API later) --Get information every 10 minutes (If there are a lot of posts in 10 minutes, you may miss it) --The language used is Python and Rasberry Pi is used instead of the server. --The survey period is from May 29th to June 4th, 2016, from July 13th to the present progressive tense.

The story of the Raspberry Pi falling down

Is the page of all posts dynamically loaded with JavaScript? It seems that it is done, so adopt selenium

Log in like this and get the source code

display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Firefox()
driver.get("http://qiita.com/")
driver.find_element_by_id("identity").send_keys("mail address")
driver.find_element_by_id("password").send_keys("password")
driver.find_element_by_name("commit").send_keys(Keys.RETURN)
time.sleep(10)//Consideration
driver.get("http://qiita.com/items")
source = driver.page_source.encode("sjis","ignore")
driver.close()
display.stop()

If I was running it as a test for a few days, the Raspberry Pi would freeze in 1 to 3 days, and if I was reviewing the source code and thought "There is no stop!" And wrote display.stop (), it would not freeze. (# ^ Ω ^)

display.stop()//← This guy

You don't need a title or a code to extract the poster (´ ・ ω ・ `) I should have pulled out the tag too ... (Failure)

result

First, by time of day

p_hour.png

It seems natural, but it seems that the time zone of 4 and 5 o'clock has the least number of posts, and the time zone of 23:00 and 0 o'clock is the most posted. After all, the engineer seems to be a night type Is it because of breaks and breaks that the number of posts increases at 11:00 and 16:00 (laughs)?

Next by day of the week

p_week.png Unexpectedly, there are many Wednesdays ... I personally thought that there were many Sundays. Is this the effect of the survey period? ??

By day

キャプチャ.PNG Raspberry Pi died for some reason on July 22nd and 23rd, and left unknowingly On July 30th and July 31st, I accidentally turned off the power strip of the Raspberry Pi.

the end

Actually, if you don't exclude the day you made a mistake, you won't get an accurate result, but I got tired of it and "poied" to Excel, so don't trust the result too much ... As I noticed on the way, if I also recorded the tags, I could count them for each tag ... Those who have been aggregated by tag

Recommended Posts

What I checked about Qiita's post
What I learned about Linux
What I learned about AI / machine learning using Python (1)
A miscellaneous summary of what I researched about Ansible
What I learned about AI / machine learning using Python (3)
What I learned about AI / machine learning using Python (2)
I searched about Pynamodb
What I was careful about when implementing Airflow with docker-compose
What I learned about AI and machine learning using Python (4)
What I was worried about when displaying images with matplotlib
What about stock trading today?
I studied about Systemd properly
What I learned in Python