[PYTHON] I tried the Naro novel API 2

I found an interesting command while reading Narurou Novel API, so I will introduce and analyze it.

Conversation rate

|Parameters|value|Description| |:--|:--|:--| |kaiwaritu |int string |The conversation rate of the novel to be extracted%It can be specified in units. When specifying a range, hyphen the minimum and maximum numbers(-)Separate with a symbol.

I see. Conversation rate …… I wonder if it's just conversation or the part of the ground

Then immediately

Prepare for loading and load the library

before_load.py


import pandas as pd
import requests
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

url = "http://api.syosetu.com/novelapi/api/"

narou_load.py


st = 1
lim = 500

data = []
while st < 2000:
    payload = {'of': 't-gp-gf-n-ka', 'order': 'hyoka',
          'out':'json','lim':lim,'st':st}
    r = requests.get(url,params=payload)
    x = r.json()
    data.extend(x[1:])
    st = st + lim
df = pd.DataFrame(data)

df.head()

payload = {'of': 't-gp-gf-n You can load it by adding a part called ka to this part. (Added above) And the data that comes out

title kaiwaritu(%)
When I was reincarnated, it was slime 14
The strongest in the world in a common profession 40
Wandering in another world with ridiculous skill 36
Mushoku Tensei-If you go to another world, you will get serious- 22
Another world fantasy song starting from Death March (web version) 38

I see. It's quite expensive (fan) However, I don't know how expensive this is in the first place, so try describe ()

kaiwaritu
count 2000.00000
mean 38.00800
std 10.66831
min 0.00000
25% 31.00000
50% 38.00000
75% 45.00000
max 96.00000

I see. Is it about the average when the average is 38%? Or rather, the number of characters is so large that it is quite common?

Let's narrow down the number of characters a little.

Reading time

I dare to use the reading time without specifying the number of characters But what is the reading time?

|Parameters|value|Description| |:--|:--|:--| |time|int string|You can specify the reading time of the novel to be extracted. The reading time is the number of characters in the novel divided by 500. When specifying a range, hyphen the minimum and maximum characters(-)Separate with a symbol.|

As you can see, the number is proportional to the number of characters, so there should be no problem except that the number becomes smaller.

Add ti to ʻof of payload` and load immediately

Since it's a big deal, try describe () on time

time
count 2000.000000
mean 1395.985500
std 1823.680635
min 11.000000
25% 434.750000
50% 889.500000
75% 1608.250000
max 26130.000000

It seems that there are at least 5001 characters. (... I don't think max is Summoner) df[['title','time']].sort_values('time').tail()

title time
Magi Craft Meister 14868
Boundary Labyrinth and the Wizard of the Other World 16410
Cooking with Wild Game 17653
Summoner goes 25536
legend 26130

** No **

Relationship between reading time (number of characters) and conversation rate

doku_kai.py


#Quartile in time
df['part']=pd.qcut(df.time,4,labels=['D','C','B','A'])
#Average for each part
df.groupby('part').agg({'kaiwaritu':['mean']})
part kaiwaritu(average:%)
D 36.990
C 38.180
B 38.322
A 38.540

This was a surprise. The conversation rate does not seem to change, especially whether it is a long story or a short story.

Stylistic style

I was disappointed, so I tried using another stylistic function. This seems to be still in the trial stage, and there are cases where data is not clearly output (it is ambiguous in the first place), and since it can not be set to ʻof`, I will make two types of data frame reading

|Parameters|value|Description| |:--|:--|:--| |buntai |int string|You can specify the style. hyphen(-)You can perform an OR search by separating them with a symbol. 1: Work that is not indented and has many continuous line breaks 2: Work that is not indented but has an average number of line breaks 4: Work that is appropriate for indentation but has many continuous line breaks 6: Work that is appropriate for indentation Works with an average number of line breaks|

First, divide into df1, df2, df4, and df6, respectively.

The strongest sage of disqualification crest-The strongest sage in the world has reincarnated to become stronger- Duke's daughter's taste Another world life of a reincarnated sage-I got a second profession and became the strongest in the world- I have reincarnated as a villain daughter who has only the ruin flag of the maiden game ... Live dungeon!

Isekai Shokudo Someone please explain this situation Hariko Maiden I will quietly disappear Mid-career (middle-aged) office worker relaxing different world industrial revolution

The strongest in the world in a common profession Mushoku Tensei-I'm serious when I go to another world- Another world fantasy song starting from Death March (web version) Re: Life in a different world starting from zero I want to be a powerful person in the shadow![Web version]

When I was reincarnated, it was slime Wandering in another world with ridiculous skill I said that the ability is an average value! It's a spider, but what is it? The magical power of the saint is versatile

There are some classifications that I don't understand, but I'll put up with it here.

df1 df2 df4 df6
count 500.000000 500.000000 500.00000 500.000000
mean 36.506000 35.246000 38.74200 37.668000
std 11.489211 14.927396 9.70091 13.106691
min 1.000000 0.000000 6.00000 0.000000
25% 28.000000 25.000000 32.75000 30.000000
50% 36.000000 35.000000 39.00000 38.000000
75% 44.000000 44.250000 45.00000 46.000000
max 70.000000 98.000000 71.00000 96.000000

Looking at this result, although there was no big difference, df2 was small overall, and df6 was large. The population parameter is set to 500 each because the initial parameter was 2000, and when displayed in 2000 parameters, df2 dropped further to 34%.

Looking at this, the conversation rate does not seem to be related to the writing style. ~~ I wonder if it's a genre ~~

Impressions

The analysis result did not go very well, but I wondered if it was a practice for my future work. If I come up with an interesting data analysis, I would like to try it. When I read it back, I was surprised at the low conversation rate of Tosura. Is it because there are many conversations in my heart?

Recommended Posts

I tried the Naro novel API 2
I tried the Naruro novel API
I tried using the checkio API
I tried to touch the COTOHA API
I tried using the BigQuery Storage API
I tried hitting the Qiita API from go
I tried using the Google Cloud Vision API
I tried to touch the API of ebay
I touched the Qiita API
I tried the changefinder library!
I tried using the API of the salmon data project
I tried hitting the API with echonest's python client
I tried to analyze the whole novel "Weathering with You" ☔️
[First COTOHA API] I tried to summarize the old story
I tried the TensorFlow tutorial 1st
I tried to create Quip API
I tried the TensorFlow tutorial 2nd
I tried to touch Tesla's API
I tried saving the DRF API request history with django-request
I tried touching the multi-vendor router control API library NAPALM
[Python] I tried collecting data using the API of wikipedia
I tried the Google Cloud Vision API for the first time
I tried to get various information from the codeforces API
I tried to move the ball
I tried to estimate the interval.
[For beginners] I tried using the Tensorflow Object Detection API
I tried using the COTOHA API (there is code on GitHub)
Continuation ・ I tried touching the multi-vendor router control API library NAPALM
I tried the TensorFlow tutorial MNIST 3rd
I tried the asynchronous server of Django 3.0
I tried using Twitter api and Line api
I tried to summarize the umask command
I tried tensorflow for the first time
I tried to recognize the wake word
I tried using YOUTUBE Data API V3
I stumbled on the Hatena Keyword API
I tried the OSS visualization tool, superset
I tried using UnityCloudBuild API from Python
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
Python: I tried the traveling salesman problem
I tried playing with the image with Pillow
I tried to make a Web API
I tried the Python Tornado Testing Framework
I tried calling the prediction API of the machine learning model from WordPress
I tried scraping
I tried PyQ
I tried AutoKeras
I tried to get the authentication code of Qiita API with Python.
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried papermill
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried "smoothing" the image with Python + OpenCV
I tried web scraping to analyze the lyrics.
I tried using AWS Rekognition's Detect Labels API
I tried using scrapy for the first time