Automatically generated catch phrase [Python]

Overview

Use Python's Mecab to generate "like sentences" from various teacher data

reference

Use MeCab from Python 3 https://qiita.com/taroc/items/b9afd914432da08dafc8

I tried to automatically generate 〇〇-like sentences using Markov chains https://www.pc-koubou.jp/magazine/4238

I tried to automatically generate blog articles using Markov chains https://karaage.hatenadiary.jp/entry/2016/01/27/073000

Source

main.py


file = 'Teacher data/Kenshi YONEZU.txt'
roopCnt = 5
size = 2

learnText.createText(file, roopCnt, size)

learnText.py


def load_from_file(files_pattern):
    '''
Reads files that match the specified file pattern, merges them, prepares them for analysis, and then returns them.
    '''

    #Read text
    text = ""
    for path in iglob(files_pattern):
        with open(path, 'r') as f:
            text += f.read().strip()

    #Remove some symbols
    unwanted_chars = ['\r', '\u3000', '-', '|']
    for uc in unwanted_chars:
        text = text.replace(uc, '')

    #Deleted Aozora Bunko notation
    unwanted_patterns = [re.compile(r'《.*》'), re.compile(r'[#.*]')]
    for up in unwanted_patterns:
        text = re.sub(up, '', text)

    return text


def split_for_markovify(text):
    '''
Split text into sentences with line breaks and break sentences into words with spaces
    '''
    #Use mecab to separate words
    mecab = MeCab.Tagger()
    splitted_text = ""

    #These characters can break markovify
    # https://github.com/jsvine/markovify/issues/84
    breaking_chars = [
        '(',
        ')',
        '[',
        ']',
        '"',
        "'",
    ]

    #Split the entire text into sentences with line breaks and break the sentence into words with spaces
    for line in text.split():
        mp = mecab.parseToNode(line)
        while mp:
            try:
                if mp.surface not in breaking_chars:
                    splitted_text += mp.surface    #Skip if node is markovify
                if mp.surface != '。' and mp.surface != '、':
                    splitted_text += ' '    #Split words with spaces
                if mp.surface == '。':
                    splitted_text += '\n'    #Reexpression with line breaks
            except UnicodeDecodeError as e:
                print(line)
            finally:
                mp = mp.next

    return splitted_text


def createText(file, roopCnt, size = 3):
    '''
Automatically generate documents from teacher data

    Parameters
    ----------
    String : file
Teacher data path
    String : roopCnt
Number of generations
    String : size
Block of words

    Returns
    -------
    List
List of generated documents
    '''

    #Teacher text reading
    rampo_text = load_from_file(file)

    #Divide the text into a learnable format
    splitted_text = split_for_markovify(rampo_text)

    #Learn the model from the text.
    text_model = markovify.NewlineText(splitted_text, state_size = size)

    textList = []
    while len(textList) < roopCnt:
        time.sleep(5)
        #Generate from model
        sentence = text_model.make_sentence()

        if(len(sentence) <= 50):
            text = ''.join(sentence.split())
            textList.append(text)
            print(text)
            continue

        while len(sentence) >= 50 :
            index = sentence.find(' ', 50)

            sliceText = sentence[:index]
            text = ''.join(sliceText.split())
            sentence = sentence[index:]

            print(text)
            textList.append(text)

Generation example

Novel-like

Natsume Soseki-like

Fill the hole in the 4th Megaki and fetch the black stone. I've never been satisfied after repeating it all the time. I'm a man who is unlikely to crawl into the absolute border, so please be aware of it right away. " "What is the Yamato soul, or I haven't confided. Depending on the reply, I will not throw it away.

Ango Sakaguchi

As a condition, it is easy to kill Utsumi, and either kill Moroi nurse or follow Moroi nurse. When he appeared in the hall, a broken bowl was turned around. In this way, the killing of Chigusa was easily completed, and both criminals struck a betting ball every day and slept every day. Isn't there a drill? ”The fact that Dr. Giant also knew the reader Before I finished wiping my face, the woman's plump and glossy clear voice was conscious of only the following far-reaching play. When the answer came out, the maid came after me, and when I came back again, I could see it here.

Property catch phrase

Like Naka-ku, Nagoya

It is a 1LDK with an auto lock for peace of mind. There is also an elementary school for food lovers. A 5-minute walk to Sakae, a two-burner stove that is great for food lovers ♪ The apartment is next to the park, The popular Nishiki 1 is within walking distance to the nice two-burner stove, Sakae. 1LDK on the skip floor.

Like Higashi-ku, Nagoya

There are two school facilities, a TV doorphone, and reheating. It is a town with a living environment. Free internet condominium! This is a one-of-a-kind property for those who have many system kitchen types and who have a high-quality counter kitchen and can move the Sakuradori line alone.

Like lyrics

Like Kenshi Yonezu

Even in the distance, I have to remember what happened next time I'm still sad and reminded of the destination of searching for jealousy. Everyone who is full of mistakes got used to it. Is there two people? No, no, I'm sorry for sometime 4 in the middle of you. Thank you for every call, so don't even sleep a little. That's why I won't go tens of thousands! The only product that is quiet to the body is that it is soft and clear in meaning, and I want to talk to you. Tell us that you shouldn't think we'll do it even though you've kept it

B'z-ish

If you can stay strong, don't let the rumor that you don't miss I don't need to be trained because my nails burn red so hard that I can check it by myself I don't regret it and I don't need you anymore Let's set the date when you flow. You were crying, standing wild and tearing I'm a transfer student, I'm always being offered a door Push!Yeah!Crush!Yeah!Weve got you come true Every time I gotta go Scenery that mixes with each other I'm scared, but I always want to go back

Aimyon-ish

In addition, the cheeks of the other white and soft marshmallow's heart "Nothing swallows a dream, the shadow is blurred in blue. I wish I could overflow before that day when I made a fool of myself. Grind what you didn't choose and show your earphones Love is always the high school graduation you came from Of course, at the very least, shut up your love-Rock-on "I feel like I'm here now The flirts are so stupid under somewhere in the extreme I'm sure this love I think I'll cool my body and go happy, so at worst it'll be better

Kana Nishino

It's not like this be together Everyone won't stop Girls just I don't know who, but I'll definitely protect it, so I'll connect. I can finally meet you. Thank you for your hard work now! nana It sa goodby, I always feel restless I don't know how I hate it anyway I don't know when I like it Even if I couldn't meet him, the child was up

I wanted to generate a manga name, but ...

Bleach | Skills and chanting http://ort.yh.land.to/bleach/chantp.html I tried it, but many words were not registered in the dictionary, and a generation error was output. Is it difficult for manga that coined words frequently in science fiction?

Recommended Posts

Automatically generated catch phrase [Python]
Automatically execute python file
Catch Ctrl-C in Python
Automatically format Python code in Vim
Automatically build Python documentation with Sphinx
[Python] Parsing randomly generated XML [ElementTree]