[PYTHON] Generate a random sentence from your tweet with trigram

It is a record. The explanation is written only lightly.

manner

Preparation

Download your tweet history. On the Twitter page, make a request by selecting "Settings & Privacy" → "Account" → "Twitter data" → "Download Twitter data", and after a while, a download link will be sent to your email address, so download from there. Since the summer of 2019, the specifications of the download data have changed and it seems that tweets.csv has becometweets.js, so it is troublesome, so using the tool written by another person tweets.csv Make a . (Https://17number.github.io/tweet-js-loader/)

Create a text folder in the same directory as your workplace and throw tweets.csv into it and you're ready to go.

Next, about the contents of tweet.py. First, create tweets.txt in the following part. The tweet body is extracted from tweets.csv and made into a txt file.

tweet.py


import csv
import re

rawfile = "text/tweets.csv"
infile = "text/tweets.txt"
outfile = "text/tweets_wakati.txt"


with open(rawfile,'r') as f:
    reader = csv.reader(f)
    
    with open(infile,'w') as f:
        for d in reader:
            if len(d) > 2:
                f.write(d[2])
            f.write('\n')

Next, use janome to divide the words and train the model. If you don't have janome, please do pip install janome first. By the way, I try to make sentences that make sense in Japanese as much as possible by eliminating alphabets, specific symbols, question boxes, etc.

tweet.py


from janome.tokenizer import Tokenizer
t = Tokenizer()


with open(infile,'r') as f:
    data = f.readlines()

p = re.compile('[a-z]+')
p2 = re.compile('[:/.@#Question ●]+')

with open(outfile,'w') as f:
    for i in range(len(data)):
        line = data[i]
        if p2.search(line):
            pass
        else:
            for token in t.tokenize(line):
                if p.search(str(token.surface)):
                    pass
                else:
                    f.write(str(token.surface))
                    f.write(' ')
            f.write('\n')
        


words = []
for l in open(outfile, 'r', encoding='utf-8').readlines():
    if len(l) > 1:
        words.append(('<BOP> <BOP> ' + l + ' <EOP>').split())


from nltk.lm import Vocabulary
from nltk.lm.models import MLE
from nltk.util import ngrams

vocab = Vocabulary([item for sublist in words for item in sublist])

print('Vocabulary size: ' + str(len(vocab)))

text_trigrams = [ngrams(word, 3) for word in words]

n = 3
lm = MLE(order = n, vocabulary = vocab)
lm.fit(text_trigrams)


Finally, random sentence generation.

tweets.py


for j in range(10):
    # context = ['<BOP>']
    context = ['<BOP>','<BOP>']
    sentence = ''
    for i in range(0, 100):
        #Randomly select a non-zero probability of connecting from the last two words in the context
        w = lm.generate(text_seed=context)

        if '<EOP>' == w or '\n' == w:
            break

        context.append(w)
        sentence += w

        
    
    print(sentence+'\n')

10 sentences are output at random. When copying and pasting, combine the code written above into one file, or divide it into cells with jupyter notebook and execute it. The latter method is recommended as it may take some time to train the model.

result

 2019-12-13 19.00.15.png

You should see 10 sentences output like the one above. It's pretty interesting so you can try it endlessly. Please try it out.

Recommended Posts

Generate a random sentence from your tweet with trigram
Create wordcloud from your tweet with python3
Create a program that can generate your favorite images with Selenium
Generate a normal distribution with SciPy
Generate a Pre-Signed URL with golang
[Python] Generate a password with Slackbot
python + faker Randomly generate a point with a radius of 100m from a certain point
Tweet the weather forecast with a bot
Generate all files with a specific extension
Tweet from python with Twitter Developer + Tweepy
Generate a class from a string in Python
Tweet the weather forecast with a bot Part 2
Install Windows 10 from a Linux server with PXE
Create a game UI from scratch with pygame2!
Create a PDF file with a random page size
How to generate a Python object from JSON
Generate a MeCab dictionary from Nico Nico Pedia data
Generate an insert statement from CSV with Python.
Draw a graph with matplotlib from a csv file
Create a decision tree from 0 with Python (1. Overview)
I tried to generate a random character string
Read line by line from a file with Python
I made a random number graph with Numpy
Extract data from a web page with Python
Perform a Twitter search from Python and try to generate sentences with Markov chains.