――What time zone do people posting on Qiita post? ――Which day of the week do you have many posts ...
The story that I wondered and investigated
--By web scraping, record the post title, URL, and poster of All Posts (I noticed the existence of API later) --Get information every 10 minutes (If there are a lot of posts in 10 minutes, you may miss it) --The language used is Python and Rasberry Pi is used instead of the server. --The survey period is from May 29th to June 4th, 2016, from July 13th to the present progressive tense.
Is the page of all posts dynamically loaded with JavaScript? It seems that it is done, so adopt selenium
Log in like this and get the source code
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Firefox()
driver.get("http://qiita.com/")
driver.find_element_by_id("identity").send_keys("mail address")
driver.find_element_by_id("password").send_keys("password")
driver.find_element_by_name("commit").send_keys(Keys.RETURN)
time.sleep(10)//Consideration
driver.get("http://qiita.com/items")
source = driver.page_source.encode("sjis","ignore")
driver.close()
display.stop()
If I was running it as a test for a few days, the Raspberry Pi would freeze in 1 to 3 days, and if I was reviewing the source code and thought "There is no stop!" And wrote display.stop (), it would not freeze. (# ^ Ω ^)
display.stop()//← This guy
You don't need a title or a code to extract the poster (´ ・ ω ・ `) I should have pulled out the tag too ... (Failure)
It seems natural, but it seems that the time zone of 4 and 5 o'clock has the least number of posts, and the time zone of 23:00 and 0 o'clock is the most posted. After all, the engineer seems to be a night type Is it because of breaks and breaks that the number of posts increases at 11:00 and 16:00 (laughs)?
Unexpectedly, there are many Wednesdays ... I personally thought that there were many Sundays. Is this the effect of the survey period? ??
Raspberry Pi died for some reason on July 22nd and 23rd, and left unknowingly On July 30th and July 31st, I accidentally turned off the power strip of the Raspberry Pi.
Actually, if you don't exclude the day you made a mistake, you won't get an accurate result, but I got tired of it and "poied" to Excel, so don't trust the result too much ... As I noticed on the way, if I also recorded the tags, I could count them for each tag ... Those who have been aggregated by tag
Recommended Posts