[PYTHON] I tried the Naruro novel API

Trigger

https://qiita.com/dely13/items/5e949a384161c961d8ce If you read this article and try it yourself after practicing ~~ play ~~, the result will be different → This article is 2017 So I tried to put out the latest (as of 10:00 on June 29, 2020)

The first half remains as it is

I will use @ dely13's article as it is

dely13.py


import pandas as pd
import requests
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

url = "http://api.syosetu.com/novelapi/api/"
#Specify API parameters in the dictionary
#Under this condition, output json format data in the order of comprehensive evaluation
payload = {'of': 't-gp-gf', 'order': 'hyoka','out':'json'}

st = 1
lim = 500

data = []
while st < 2000:
    payload = {'of': 't-gp-gf-n', 'order': 'hyoka',
          'out':'json','lim':lim,'st':st}
    r = requests.get(url,params=payload)
    x = r.json()
    data.extend(x[1:])
    st = st + lim
df = pd.DataFrame(data)

#Preprocessing('year'Add column,'title_len'Add column)
df['general_firstup'] = pd.to_datetime(df['general_firstup'])
df['year'] = df['general_firstup'].apply(lambda x:x.year)

df['title_len'] = df['title'].apply(len)

Please read the original article for details as it is really as it is

Main subject

In 2017

Interesting numbers. The average value is 17 characters, which is the same as the number of characters in haiku. In other words, the title of Naruro was haiku! Frog Poem and the sound of water jumping into a frog ...

I was told, but in 2020 ...?

df['title_len'].hist()

df['title_len'].describe()

Histogram diagram df ['title_len'] .hist () image.png Data df ['title_len'] .describe ()

count 2000.000000 mean 24.179500 std 15.528356 min 2.000000 25% 12.000000 50% 21.000000 75% 32.000000 max 100.000000 Name: title_len, dtype: float64

Wwwwwww which increases 7 characters on average

And what's really interesting is from here

per_year.py


title_by_year = df.groupby('year')['title_len'].agg(['mean','count','std']).reset_index()
#plot
title_by_year.plot(x='year',y='mean') 
#data
title_by_year

Plot title_by_year.plot (x ='year', y ='mean') * mean = average image.png

Aggregate title_by_year

year mean count std
2008 7.500000 2 2.121320
2009 12.428571 7 8.182443
2010 10.882353 17 5.278285
2011 10.180000 50 4.684712
2012 13.294737 95 6.963237
2013 14.115942 138 8.541930
2014 16.065476 168 8.780176
2015 18.218009 211 9.701245
2016 21.577358 265 12.326472
2017 24.476015 271 11.750113
2018 29.425856 263 13.890288
2019 31.327327 333 15.861156
2020 40.483333 180 22.348053

Conclusion

** The title of 2019 will be Tanka ** The person who guessed in the 2017 article is amazing. It's Don Pisha.

Digression 1

Since it's a big deal, I'll try to find the maximum and minimum

title_by_year = df.groupby('year')['title_len'].agg(['mean','min','max']).reset_index()
#plot
title_by_year.plot(x='year')
#data
title_by_year.plot

Plot title_by_year.plot (x ='year') image.png

Data title_by_year

year mean min max
2008 7.500000 6 9
2009 12.428571 5 25
2010 10.882353 2 23
2011 10.180000 4 26
2012 13.294737 3 40
2013 14.115942 3 54
2014 16.065476 4 63
2015 18.218009 3 59
2016 21.577358 2 77
2017 24.476015 4 69
2018 29.425856 5 74
2019 31.327327 4 100
2020 40.483333 4 100

Isn't this 100-character data exceeding the number of characters?

max_100.py


df[['ncode','title','year','title_len']].set_index('ncode').query('title_len==100')
ncode title year title_len
N7855GF I was treated as incompetent and was banished from my childhood friend party. I made full use of the gift "Translation".... 2020 100
N6203GE A blacksmith who was exiled from the dictatorship, in fact, with the protection of "Blacksmith Goddess", suddenly with "Super Legendary" armor full equipment... 2020 100
N0533FS [Series version] I witnessed the chasing idol walking with a handsome guy, so I bought a part-time job... 2019 100
N4571GF In the 7th week of the loop, I learned that I was fitted with my believing friends, so I actively partyed on the 8th lap.... 2020 100

... this isn't over 100 characters ...?

When I looked it up after writing the article, it was exactly 100 characters image.png

Is there a character limit? That's what I'm fighting at the limit.

Digression 2

On the contrary, I was interested in short titles

mini_len.py


df.groupby('title_len')['title_len'].agg(['count']).head(9).T

List of correspondence between the number of characters and the number of works Since it has become longer, it is placed horizontally

title_len 2 3 4 5 6 7 8 9 10
count 2 8 18 35 41 38 64 75 89

title2_4.py


df[['title','year','title_len']].set_index('title').sort_values('title_len').query('title_len<5')

4 characters are excerpts

title year title_len
letter 2016 2
dawn 2010 2
Bow and sword 2013 3
The reason for water 2012 3
Tomb King! 2013 3
Childhood friend 2016 3
Searcher 2013 3
The shadow of the tower 2012 3
Extermination person 2015 3
Cat and dragon 2013 3
Oblivion saint 2020 4
J/53 2012 4
Black Demon King 2011 4
My servant 2019 4
Mob love 2015 4
Wise man's grandson 2015 4
Seventh 2014 4

Even a few letters are famous. I was impressed that the former Moba people had a "title" in the four letters.

Impressions

Is it the influence of the animation of mobile novels that many beginners enter, if not as much as Moba (currently Ebu)? I was trained by Moba, so even if it's a little difficult to read, I'll read it if the content is interesting, but even so, the title is long. I'm addicted to this and this, which are rather long titles. (Ebudato this * Stemmer)

I wanted to try various things because I can narrow down the search conditions with the Naro API. What if you want to extract more than 2000 items ...

Recommended Posts

I tried the Naruro novel API
I tried the Naro novel API 2
I tried using the checkio API
I tried to touch the COTOHA API
I tried using the BigQuery Storage API
I tried hitting the Qiita API from go
I tried using the Google Cloud Vision API
I tried to touch the API of ebay
I touched the Qiita API
I tried the changefinder library!
I tried using the API of the salmon data project
I tried hitting the API with echonest's python client
I tried to analyze the whole novel "Weathering with You" ☔️
[First COTOHA API] I tried to summarize the old story
I tried the TensorFlow tutorial 1st
I tried to create Quip API
I tried the TensorFlow tutorial 2nd
I tried to touch Tesla's API
I tried touching the multi-vendor router control API library NAPALM
[Python] I tried collecting data using the API of wikipedia
I tried the Google Cloud Vision API for the first time
I tried to get various information from the codeforces API
I tried to move the ball
I tried to estimate the interval.
[For beginners] I tried using the Tensorflow Object Detection API
I tried using the COTOHA API (there is code on GitHub)
I tried the TensorFlow tutorial MNIST 3rd
I tried the asynchronous server of Django 3.0
I tried using Twitter api and Line api
I tried to summarize the umask command
I tried tensorflow for the first time
I tried to recognize the wake word
I tried using YOUTUBE Data API V3
I stumbled on the Hatena Keyword API
I tried the OSS visualization tool, superset
I tried using UnityCloudBuild API from Python
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
Python: I tried the traveling salesman problem
I tried playing with the image with Pillow
I tried to make a Web API
I tried the Python Tornado Testing Framework
I tried calling the prediction API of the machine learning model from WordPress
I tried scraping
I tried PyQ
I tried AutoKeras
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried papermill
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried "smoothing" the image with Python + OpenCV
I tried web scraping to analyze the lyrics.
I tried using AWS Rekognition's Detect Labels API
I tried the pivot table function of pandas
I checked the library for using the Gracenote API
I tried using Remote API on GAE / J
[Python] I tried substituting the function name for the function name