[Python / GAS] A story about creating a personal Web API that allows you to read all about becoming a novelist in vertical writing, and then making it a LINE bot.

Introduction

Do you all enjoy your self-restraint life? I can't enter university, so I spend my time studying programming at home, which has nothing to do with research. ~~ More fun than research ... ~~

Also, I have more time to read. Even if you say reading, there is a high percentage of people reading to become a novelist.

So, in order to make such a self-restraint life as comfortable as possible

** You can read several episodes of the work of becoming a novelist in vertical writing **

I made such a Web API.

motion

The Web API created this time is

-------- Naro Novel API --------

** Search for novels with specified words **   ↓ ** Select the work with the highest total points and get NCODE **   ↓ -------- Naro Novel API --------   ↓ ** Scrap the story in the specified range **   ↓ ** Vertically formatted into novel-like HTML **   ↓ ** Provided **

I am processing in the flow.

When accessed from a smartphone, it looks like the following. You can read it by scrolling horizontally. (From 10 episodes of unemployed reincarnation) image.png

From PC image.png

Implementation

Python First from Python Scraping can be overcome with just requests and re, and the server part uses Flask.

# -*- coding: utf-8 -*-
from flask import Flask
from requests import get
import re

app = Flask(__name__)


def narou_html(keyword, num=1, pivot='e'):
    honbun_ = ""
    item = get(
        f"https://api.syosetu.com/novelapi/api/?out=json&of=t-n&lim=1&order=hyoka&word={keyword}"
    ).json()[1]
    #item = max(items[1:], key=lambda x: x["global_point"])  #Remnants of an era when you didn't know you could hit the API in a sorted form
    url = "https://ncode.syosetu.com/"
    ncode = item["ncode"]
    headers = {
        'User-Agent':
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
    }
    t = get(url + ncode, headers=headers).text
    slist = []
    sl = re.findall('<dl class="novel_sublist2">(.+?)</dl>',
                    t.replace("\n", ""))
    for i in sl[-num:] if pivot == 'e' else sl[:num +
                                               1] if pivot == 's' else sl[
                                                   int(pivot) - 1:int(pivot) +
                                                   num - 1]:
        groups = re.search(
            '<a href="/(.+?)/">(.+?)</a></dd><dt class="long_update">(.+?)[(?:<span)|(?:</dt>)]',
            i)
        slist.append({
            "title": groups.group(2),
            "href": groups.group(1),
            "long_update": groups.group(3)
        })
    for s in range(len(slist)):
        t = get(url + slist[s]["href"], headers=headers).text
        honbun = re.search(
            '<p class="novel_subtitle">(?:.|\s)+?<div id="novel_honbun"(?:.|\s)+?</p>\n</div>',
            t).group(0)
        for j, i in enumerate(honbun.splitlines()):
            if i[:2] == "<p":
                groups = re.search("(<p.*?>)(.+?)(</p>)", i)
                i = re.sub('(\d{1,4})',
                           '<span class="text-combine">\\1</span>',
                           groups.group(2))
                if i[0] in ('(', '「', '(', '『',
                            '【') and honbun_[-9] not in (')', ')', '」', '』',
                                                         '】'):
                    i = '<br><br>' + i
                if i[-1] in (')', ')', '」', '』', '】'):
                    i += '<br><br>'
                if i[0] == '・': i = '<br>' + i + '<br>'
                elif j > 1 and i[0] != '<' and honbun_[-1] != '>':
                    if re.match("\s", i):
                        i = i[1:]
                elif not re.match('\s', i) and i[0] not in ('(', '「', '(', '『',
                                                            '【'):
                    i = " " + i
                if j == 0:
                    i = '<h3>' + i + '</h3>\n<br><br>'
            honbun_ += i
        honbun_ += "<br>" * 25 + "\n"
    honbun_ = '''<html>

<head>
    <meta name="viewport" content="initial-scale=1.3,minimum-scale=1.3">
    <style>
        body {
            margin-top: 3.5%;
            margin-bottom: 3%;
            white-space: break-all;
            -ms-writing-mode: tb-rl;
            writing-mode: vertical-rl;
            text-orientation: upright;
            font-family: "Noto Serif JP", serif;
            font-size: 85%;
        }

        .text-combine {
            -webkit-text-combine: horizontal;
            -ms-text-combine-horizontal: all;
            text-combine-upright: all;
        }
    </style>
    <title>''' + item["title"] + '''</title>
    <link href="https://fonts.googleapis.com/css?family=Noto+Serif+JP&display=swap&subset=japanese" rel="stylesheet">
</head>

<body>
    ''' + honbun_[:-93].replace('→', '↓').replace('-', '|').replace("-", "|") + '''</body>

</html>'''
    return honbun_


@app.route("/")
def hello_world():
    return "Usage: http://ip-address:port/title/pivot/num             pivot - s: from the first. e: from the end. m: pick up one."


@app.route('/<keyword>/<pivot>/<num>')
def html_(keyword, pivot, num):
    html = narou_html(keyword, int(num), pivot)
    return html


if __name__ == "__main__":
    app.run(host="0.0.0.0")

(If you use a lot of regular expressions, the color coding of Qiita will collapse due to some backslash and it will be sad) Since the shaping is made to look good with a sense, sometimes sentences that should be broken are connected.

After execution http: // localhost: 5000 / search word / described later / described later (only from the execution terminal) orhttp: // local IP address of the execution terminal: 5000 / search word / described later / described later You can read it at. The part of described later is For the former, from the first episode with s, from the latest episode with ʻe, from the part specified by number`, What's behind, how many stories to return You can specify.

GAS(Google Apps Script)

I tried to upload it to Heroku to make it a LINE bot, but it seemed that I could not scrape to become a novelist from Heroku's server, so I rewrote it with GAS. I did. I don't write much Javascript itself, but on top of that, the code, such as GAS's own functions, was short, but it was a lot of trial and error. Especially, I usually run Jupyter immediately! Immediate response! It was fatal that the log could not be displayed properly because I got used to it.

Using two projects to use as a LINE bot, One is for LINE response, The second is for the Web API similar to the Python code above It was made.

When you send a message, the first project will reply with a URL, and when you access that URL, the second project will provide you with a formatted page of the novel. It is a flow. GAS is amazing that you can do this for free. But it's slow.

The first one just returns the URL, so I decided to have a look at the site that explains how to make a LINE bot with GAS, and I will post only the second code. Also, here it is simplified from the Python version, such as implementing only from the latest story to returning the specified number of stories (rather, the Python side was improved after writing this).

function doGet(e) {
  var keyword = e.parameter.keyword; //Search word
  var num = parseInt(e.parameter.num); //Number of stories
  var getUrl = "https://api.syosetu.com/novelapi/api/?out=json&of=t-n&lim=1&order=hyoka&word="+keyword;
  var response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8');
  var json = JSON.parse(response)[1];
  var title = json["title"]
  var ncode = json["ncode"]
  getUrl = "https://ncode.syosetu.com/"+ncode;
  response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8').replace(/[\r\n]+/g,"");
  var items = response.match(/<dl class="novel_sublist2">(.+?)<\/dl>/gm);
  items = items.slice(items.length-num,items.length);
  var slist = [];
  for (var i = 0;i<num;i++){
    slist.push(items[i].match(/(<a href="\/)(.+?)(\/">)/)[2]); //Collect hrefs for specified stories
  }
  var honbun = ""
//Below shaping time
  for (var s = 0;s<num;s++){
    getUrl = "https://ncode.syosetu.com/"+slist[s];
    response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8');
    var honbun_ = response.match(/<p class="novel_subtitle">(?:.|\s)+?<div id="novel_honbun"(?:.|\s)+?<\/p>\n<\/div>/)[0];
    var sphon=honbun_.split(/[\r\n]+/g);
    for (var i = 0, len=sphon.length;i<len;i++){
        if (sphon[i].slice(0,2) == "<p"){
          var groups = sphon[i].match(/(<p.*?>)(.+?)(<\/p>)/);
          var temp = groups[1] + groups[2].replace(/(\d{2,4})/g, '<span class="text-combine">$1<\/span>') + groups[3];
          if(i == 0){
            temp = '<h3>' + temp + '</h3>';
            Logger.log(temp);
          }
          honbun += temp + "\n"
        }
        else{honbun += sphon[i] + "\n";}
     }
    honbun += "</br></br>";
  }
  honbun = '<html>\n<head>\n <title>' + title + '</title>\n</head>\n<font size="5"><style>\n    body {\n        -ms-writing-mode: tb-rl;\n        writing-mode: vertical-rl;\n        text-orientation: upright;\n       font-family: "Yu Mincho", YuMincho, "Hiragino Mincho ProN W3", "Hiragino Mincho ProN W3", "Hiragino Mincho ProN", "HG Mincho E", "MS P Mincho", "MS Mincho", serif;\n   }\n\n    .text-combine {\n        -webkit-text-combine: horizontal;\n        -ms-text-combine-horizontal: all;\n        text-combine-upright: all;\n    }\n</style>\n\n<body>\n' + honbun + ' \n</body>\n</font>\n</html>'
  var html = HtmlService.createHtmlOutput(honbun);  //I'm not sure. A preparatory guy to display HTML in the state of a string as HTML?
  return html;
}

Published → Introduced as a web application And the generated address + ? Keyword = search word & num = number of stories By accessing, a vertically formatted page will be displayed. (The LINE bot will reply to this address in the first project.) image.png

Summary

Reading progresses ... HTML is an amateur, so if you don't like it, modify it ... …… When using it, please use it with moderation so as not to put a heavy burden on the server to become a novelist.

Recommended Posts

[Python / GAS] A story about creating a personal Web API that allows you to read all about becoming a novelist in vertical writing, and then making it a LINE bot.
[Google Photo & Slack Photo Bot] A story about making a bot that acquires photos in Google Photos and sends them to Slack.
Story of making a virtual planetarium [Until a beginner makes a model with a script and manages to put it together]
Until you create a machine learning environment with Python on Windows 7 and run it
A story about predicting exchange rates with Deep Learning
[Python / GAS] A story about creating a personal Web API that allows you to read all about becoming a novelist in vertical writing, and then making it a LINE bot.
A story that makes it easy to estimate the living area using Elasticsearch and Python
A story about porting the code of "Try and understand how Linux works" to Rust
(Memo) Until you extract only the part you want from a certain Web page, convert it to a Sphinx page, and print it as a PDF
Try to build a deep learning / neural network with scratch
A story about a 40-year-old engineer manager passing "Deep Learning for ENGINEER"
[Google Photo & Slack Photo Bot] A story about making a bot that acquires photos in Google Photos and sends them to Slack.
A story about trying to automate a chot when cooking for yourself
A story about everything from data collection to AI development and Web application release in Python (3. AI development)
A story about a beginner making a VTuber notification bot from scratch in Python
Create a plugin that allows you to search Sublime Text 3 tabs in Python
[Python] A story about making a LINE Bot with a practical manned function on its own without using Salesforce [Messaging API]
A Python script that crawls RSS in Azure Status and posts it to Hipchat
[Python] About creating a tool to display all the pages of the website registered in the JSON file & where it got caught
Jedi-vim shortcut command that allows you to refer to the definition source and definition destination in Python
A learning roadmap that allows you to develop and publish services from scratch with Python
A story that makes it easy to estimate the living area using Elasticsearch and Python