Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain

Preface

Last time Introduction to Markov Chain Chatbot (1) Janome with Python + Janome This time, as a preparation for the Markov chain, I will use Janome to break up the sentences and implement a simple simple Markov chain.

What is a Markov chain?

Introduction

A Markov chain (Markov chain) is a type of stochastic process in which the possible states are discrete (finite or countable) (discrete state Markov process). (From Wikipedia)

Imagine a simple sugoroku. Roll the dice in a certain square and move forward by the amount of the roll. No matter where the starting point is, the values that the dice have given up to that point have nothing to do with the future event of "going four squares ahead." This is Markov property. You will not be caught between the squares and will reach one of the squares. This is discrete. Such a Markov process is called a simple Markov chain, for example. "Advance only the sum of the previous eye and this time" If you set a rule like this, this will be a Markov chain on the second floor.

Application to sentence generation

In sentence generation, the Nth floor Markov chain is used. Consider the following sentence as an example.

A Markov process is a stochastic process with Markov properties. It is a stochastic process that has the property that future behavior is determined only by the current value and has nothing to do with past behavior.

If you disassemble these using Janome, it will look like this.

Markov|process|When|Is|、|Markov|sex|To|Offal|probability|process|of|こWhen|To|Say|。 future|of|behavior|But|Current|of|value|Only|so|Decision|Sa|Re|、|past|of|behavior|When|Irrelevant|so|is there|Whenいう|nature|To|Have|probability|process|so|is there|。

First, set the starting point arbitrarily. Let's make it "Markov". Determine the next block with reference to the current state "Markov". Since it is "process" and "sex" that are connected to "Markov", choose one of them. Let's make it a "process". If it is a simple Markov chain, then select from "to", "no", and "de" following the "process". For the 2nd floor Markov chain, select "to" following the 2 blocks of "Markov" and "Process". This time, let's easily implement a simple Markov chain with N = 1.

Implementation (simple Markov process)

from janome.tokenizer import Tokenizer
import random

t = Tokenizer()

s = "The Markov process is a stochastic process with Markov properties.\
It is a stochastic process that has the property that future behavior is determined only by the current value and has nothing to do with past behavior."

line = ""

for token in t.tokenize(s):
    line += token.surface
    line += "|"

word_list = line.split("|")
word_list.pop()

dictionary = {}
queue = ""
for word in word_list:
    if queue != "" and queue != "。":
        if queue not in dictionary:
            dictionary[queue] = []
            dictionary[queue].append(word)
        else:
            dictionary[queue].append(word)
    queue = word

def generator(start):
    sentence = start
    now_word = start
    for i in range (1000):
        if now_word == "。":
            break
        else:
            next_word = random.choice(dictionary[now_word])
            now_word = next_word
            sentence += next_word
    return sentence

for i in range(5):
    print(generator("Markov"))

=========== Note below ===========

from janome.tokenizer import Tokenizer
import random

t = Tokenizer()

s = "The Markov process is a stochastic process with Markov properties.\
It is a stochastic process that has the property that future behavior is determined only by the current value and has nothing to do with past behavior."

line = ""

for token in t.tokenize(s):
    line += token.surface
    line += "|"

word_list = line.split("|")
word_list.pop()

In the first half, an array is generated in which the above sentences are separated and stored. As mentioned above|Generate a delimiter string and|And split. There are empty strings left in the box, so remove them with pop.

dictionary = {}
queue = ""
for word in word_list:
    if queue != "" and queue != "。":
        if queue not in dictionary:
            dictionary[queue] = []
            dictionary[queue].append(word)
        else:
            dictionary[queue].append(word)
    queue = word

In the middle part, a dictionary (dict type object) is created using the array created in the first half. Put the current word in the queue The following word word is added (create a new one if there is no item). After adding, replace queue with the following word word. The contents of the dictionary

{'Markov': ['process', 'sex'], 'process': ['When', 'of', 'so'], 'When': ['Is', 'Irrelevant'], 'Is': ['、'], '、': ['Markov', 'past'], 'sex': ['To'], 'To': ['Offal', 'Say', 'Have'], 'Offal': ['Certainly
rate'], '確rate': ['process', 'process'], 'of': ['thing', 'behavior', 'value', 'behavior'], 'thing': ['To'], 'Say': ['。'], 'future': ['of'], 'behavior': ['But', 'When'], 'But': ['Current'], 'Current': ['of'], 'value': ['Is
Ke'], 'だKe': ['so'], 'so': ['Decision', 'is there', 'is there'], 'Decision': ['Sa'], 'Sa': ['Re'], 'Re': ['、'], 'past': ['of'], 'Irrelevant': ['so'], 'is there': ['That', '。'], 'That': ['nature'], 'sex
quality': ['To'], 'Have': ['probability']}

It is like this.

def generator(start):
    sentence = start
    now_word = start
    for i in range (1000):
        if now_word == "。":
            break
        else:
            next_word = random.choice(dictionary[now_word])
            now_word = next_word
            sentence += next_word
    return sentence

for i in range(5):
    print(generator("Markov"))

In the second half, it's finally time to generate sentences. Randomly select the next word from the group of words that follow the current word and attach it to the sentence. I wrote that it will end when "." Comes, but it is probabilistically that "." Will not be selected forever, so it is specified that the repetition ends at most 1000 times. Ask them to write out five sentences that start with "Markov".

It has nothing to do with Markov property stochastic processes.
It has nothing to do with the Markov property stochastic process.
The behavior of stochastic processes with Markov properties is the present.
Markov property.
It is a stochastic process that has a Markov process.

(It's chubby) However, such a sentence was born.

Prospects for the next time

The simple Markov chain tends to be incoherent as above because it has no connection with the past. Also, this time the dictionary was too poor, and the sentences were similar and close to each other. Next time, the Nth-order Markov chain, which selects the next word based on multiple chains and generates sentences, I will try to implement it with a larger dictionary.

Recommended Posts

Markov Chain Chatbot with Python + Janome (1) Introduction to Janome
Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
[Introduction to Python] Let's use foreach with Python
[Python] Introduction to CNN with Pytorch MNIST
[Python] Easy introduction to machine learning with python (SVM)
Introduction to Artificial Intelligence with Python 1 "Genetic Algorithm-Theory-"
Introduction to Artificial Intelligence with Python 2 "Genetic Algorithm-Practice-"
Introduction to Tornado (1): Python web framework started with Tornado
Introduction to Python language
Introduction to OpenCV (python)-(2)
Introduction to formation flight with Tello edu (Python)
Introduction to Python with Atom (on the way)
Introduction to Generalized Linear Models (GLM) with Python
[Introduction to Udemy Python3 + Application] 9. First, print with print
[Introduction to Python] How to iterate with the range function?
[Chapter 5] Introduction to Python with 100 knocks of language processing
An introduction to Python distributed parallel processing with Ray
Introduction to Mathematics Starting with Python Study Memo Vol.1
Reading Note: An Introduction to Data Analysis with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
Connect to BigQuery with Python
Introduction to Python Django (2) Win
Connect to Wikipedia with Python
Post to slack with Python 3
Introduction to RDB with sqlalchemy Ⅰ
Introduction to serial communication [Python]
Switch python to 2.7 with alternatives
Write to csv with Python
[Introduction to Python] <list> [edit: 2020/02/22]
Introduction to Python (Python version APG4b)
An introduction to Python Programming
Introduction to Python For, While
I tried to automatically create a report with Markov chain
Introduction to her made with Python ~ Tinder automation project ~ Episode 6
[Markov chain] I tried to read negative emotions into Python.
[Markov chain] I tried to read a quote into Python.
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
Introduction to her made with Python ~ Tinder automation project ~ Episode 5
Introduction to Python for VBA users-Calling Python from Excel with xlwings-
[Raspi4; Introduction to Sound] Stable recording of sound input with python ♪
[Introduction to Python] How to get data with the listdir function
[Introduction to Udemy Python3 + Application] 51. Be careful with default arguments
[Introduction to Udemy Python 3 + Application] 58. Lambda
[Introduction to Udemy Python 3 + Application] 31. Comments
Python: How to use async with
Link to get started with python
Introduction to Python Numerical Library NumPy
Practice! !! Introduction to Python (Type Hints)
[Introduction to Python3 Day 1] Programming and Python
[Python] Write to csv file with Python
Create folders from '01' to '12' with python
Nice to meet you with python
[Introduction to Python] <numpy ndarray> [edit: 2020/02/22]
[Introduction to Udemy Python 3 + Application] 57. Decorator
Try to operate Facebook with Python
Introduction to Python Hands On Part 1
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
Output to csv file with Python