This article was posted on the 7th day of the IT Advent Calendar with the theme ** "Masashi Sada x IT" **.
Morphological analysis of 450 songs sung by Masashi Sada collects lyrics in 5 and 7 tones. Combining them completes a program that sings ** Masashi Sada Senryu **.
--Inarizushi, not dexterous, love is good ――Ah, I can't reach the love passing by --I'm not good at it. I got it for the first time. --A dying knee kid who can replace you --Graduation, young but no hair now --Happiness and wonderful he will not let go --Your nice companion, lonely eyes --Crouching, it's a Kitane character, the person who wrote it ――I loved spring without life --Sadness always programs you --Congratulations on becoming a flower and bearing fruit ――Jiji to Baba seems to be alive --I have a dream, I can't get it, outside the window ――After taking it off, my son longed for me --I can't go home, but the bag still looks at the port ――Small nosy that you can also fall in love ――I have nothing and I'm just by my side ――I liked the sweet letter on Sunday --Believe me, buy a betting ticket that you passed away --When you love, you endure beyond the white smoke --My fault is born in your eyes --Ano Musume no Reason for farewell bullet --Speak lightly, always make you a friend ――Embark on a journey of living alone ――The wind dances to the fullest on the slope ――Locked Keisei Ueno, you line up ――Young traveler's letter that you won ――It's up. During the trip, I'm happy

lyric.py
# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
from bs4 import BeautifulSoup
import requests
# uta-URL of the song by Masashi Sada on the net
urls = [
    'http://www.uta-net.com/artist/1399/0/1/',
    'http://www.uta-net.com/artist/1399/0/2/',
    'http://www.uta-net.com/artist/1399/0/3/',
]
class LyricSheet(object):
    """
lyrics
    """
    def __init__(self, title, song_id):
        self.title = title
        self.song_id = song_id
    def __repr__(self):
        return str(self.song_id) + ':' + self.title
    @property
    def url(self):
        _base = 'http://www.uta-net.com/user/phplib/svg/showkasi.php?ID={}'
        return _base.format(str(self.song_id))
    def write_file(self):
        with open("./data/main.txt", 'a') as text:
            response = requests.get(self.url)
            assert response.status_code == 200
            text.write(response.text)
def generate_lyrics(url):
    response = requests.get(url)
    assert response.status_code == 200
    soup = BeautifulSoup(response.text)
    songs = []
    for td in soup.tbody.find_all("td"):
        if td.a:
            _url = td.a['href']
            if 'song' in _url:
                _song_id = _url.replace('/', '').replace('song', '')
                songs.append(LyricSheet(td.a.text, _song_id))
    return songs
#Perspective of song list page
lyrics = []
for url in urls:
    lyrics += generate_lyrics(url)
#URL output
for lyric in lyrics:
    print lyric.url
# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
from bs4 import BeautifulSoup
from janome.tokenizer import Tokenizer
import random
def cut_word(count):
    """
Number of characters when reading katakana from a file:Cut and return only count
Example)
    input - 5
    output - ["a-I-U-E-O", "Kakikukeko", "Sashisuseso"]
    """
    _tmp = []
    result = []
    path = './data/main.txt'
    f = open(path, "r")
    
    for body in f:
        soup = BeautifulSoup(body)
        for t in soup.find_all('text'):
            if t is None:
                continue
    
            for token in tokenizer.tokenize(t.text):
                #Reset if blank
                if 'Blank' in token.part_of_speech:
                    _tmp = []
    
                # _Only nouns, adjectives, and verbs start with tmp
                if len(_tmp) == 0:
                    if 'Auxiliary verb' in token.part_of_speech:
                        continue
    
                    if 'suffix' in token.part_of_speech:
                        continue
    
                    if 'Non-independent' in token.part_of_speech:
                        continue
    
                    if 'number' in token.part_of_speech:
                        continue
    
                    if 'noun' in token.part_of_speech or 'adjective' in token.part_of_speech or 'verb' in token.part_of_speech:
                        pass
                    else:
                        continue
                _tmp.append(token)
                #Reading when reading katakana
                reading = ''.join([_token.reading for _token in _tmp])
                if len(reading) == count:
                    s = ''.join([_token.surface for _token in _tmp])
                    if '¥' not in s:
                        # debug
                        # result.append(s + '    :' + ''.join([_.surface + _.part_of_speech for _ in _tmp]))
                        result.append(s)
                    _tmp = []
                elif len(reading) > count:
                    #reset
                    _tmp = []
        f.close()
        return result
tokenizer = Tokenizer()
word_seven = cut_word(7)
word_five = cut_word(5)
for x in xrange(100000):
    print random.choice(word_five), random.choice(word_seven), random.choice(word_five)
print len(word_five), len(word_seven)
As a result of generating 100,000 phrases of senryu, for example, a large amount of ridiculous senryu such as Speaking, become miso soup, happiness and Kusafue is your pupil's calendar have been generated. It is easy for people to see it, but when it comes to clarifying a concrete and good definition of senryu, it is difficult to put it into words and it is a difficult problem.
I consulted with a detailed person. When I showed a phrase to a friend from the Faculty of English, it was pointed out that there were many phrases for which the dependency was not established. As a future task, it seems that we can develop a filter that extracts better phrases by performing dependency analysis and scoring whether it is established or not.
It is rare for a person to be able to properly distinguish between haiku and senryu. There is no time to list the laughing stories that were praised as ** "It's a good senryu" ** for the haiku that I wrote. This time, I thought about the extraction logic of good Masashi Sada senryu with reference to modern haiku, which has a long history and is being researched.
Professor Takeo Kuwabara, a French literary scholar, published in the November 1946 issue of Iwanami Shoten's magazine "Sekai" ** "Second Art-About Contemporary Haiku-" ** In his dissertation, haiku was referred to as "second art". He argued that it should be distinguished from art and caused a great deal of controversy in the haiku podium at that time (second art controversy). From a human point of view on the podium side, a French Kabure scholar dismissed that haiku was less artistic than other arts, but the famous haiku at that time could not argue well, and he could not compete with the scholar. It is a story that has thrown.
Kuwahara replied to the question ** "What is art?" ** "What impresses the human heart. And art is meaningful to think deeply." **. It also states that ** art is meaningless unless "the experience of the author is reproduced in the viewer" **. In other words, it is a shared impression. I thought that sharing impressions between the author and the viewer was indispensable for good Masashi Sada senryu. After reading, I extracted the phrases that make me think of the scene. On the other hand, phrases that are just a list of words are excluded.
――Alive, painful, cavities are killer ――See, ask and visit there, gypsophila --One more step, give up your dream, Yamazakura ――When it's sunny, it's on the rocks. --With a sigh, the buzz of the town, a shooting star ――My face guys are all bad
It is said that Masashi Sada's song touches the heartstrings for some reason. One of the reasons why it has been loved by the people for 40 years is that it is excellent in the method of sharing impressions by making the viewer think of the scene in the heart of the viewer who listened to the song because of its excellent expression. it might be.
Finally, I will attach 100,000 phrases of Masashi Sada Senryu. I would like to leave the modeling and programming of the logic to extract good senryu to the following researchers. -100,000 phrases of Masashi Sada Kawayanagi (100k_haiku.txt.zip)
If you tweet the phrase you like, it will be helpful and helpful in the future. Thank you for your attention today.
Recommended Posts