[PYTHON] Tweet your own sentences using Markov chains

motivation

I like Twitter, so I thought that if the program automatically generated my tweets, it would do it for me even when I was busy.

[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning When I read, it seems that Markov chains generate more similar sentences than Deep Learning, so I tried to program with that policy. ... or rather, it's the same as doing it completely ...

Shumai-kun (@shuumai) and compressed newspapers (@asshuku) also seem to be Markov chains, so this policy should probably be fine. ..

Preprocessing

All of your past tweets can be downloaded from the official website as a zip file containing json and csv. Download all your tweet history

This time, we will use the tweets.csv obtained from this. Since some tweets include replies and URLs, you need to exclude them so that you do not unintentionally skip replies or tweet non-existent URLs. It's interesting that unintended replies are sent, but for the time being.

It seems that pandas is fast for reading CSV files, so use read_csv of pandas. Roughly match the reply and the URL string and exclude it.

preprocessing.py


import pandas as pd
import re

df = pd.read_csv('tweets.csv')

tweets = df['text']

replypattern = '@[\w]+'
urlpattern = 'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+'

processedtweets = []

for tweet in tweets:
    i = re.sub(replypattern, '', tweet)
    i = re.sub(urlpattern, '', i)
    if isinstance(i, str) and not i.split():
        pass
    else:
        processedtweets.append(i)

processedtweetsDataFrame = pd.Series(processedtweets)
newDF = pd.DataFrame({'text': processedtweetsDataFrame})

newDF.to_csv('processedtweets.csv')

I also wanted to exclude tweets from a particular application (such as a diagnostic maker), so I deleted the matching df ['source'] separately.

Creating a Markov chain database

I already had a package for Python for sentence generation by Markov chain, so I used it. https://github.com/o-tomox/TextGenerator

However, since the above package is written in Python 2 series, I modified it to 3 series as appropriate.

The following is the code to store the triplet data from the CSV file of the tweet generated by the preprocessing in the database.

storeTweetstoDB.py


from PrepareChain import *
import pandas as pd
from tqdm import tqdm

def storeTweetstoDB():
    
    if len(sys.argv) > 2:
        df = pd.read_csv(sys.argv[1])
    else:
        csvfilepath = input('tweets.csv filepath : ')
        df = pd.read_csv(csvfilepath)


    tweets = df['text']

    print(len(tweets))

    chain = PrepareChain(tweets[0])
    triplet_freqs = chain.make_triplet_freqs()
    chain.save(triplet_freqs, True)

    for i in tqdm(tweets[1:]):
        chain = PrepareChain(i)
        triplet_freqs = chain.make_triplet_freqs()
        chain.save(triplet_freqs, False)



if __name__ == '__main__':
    storeTweetstoDB()

Now you are ready to automatically generate your own tweets. Let's run it.

Generate tweets

Let's tweet using the completed database and Twitter API.

markovbot.py


import json
from requests_oauthlib import OAuth1Session
from GenerateText import GenerateText

def markovbot():
    keysfile = open('keys.json')
    keys = json.load(keysfile)
    oath = create_oath_session(keys)

    generator = GenerateText(1)

    tweetmarkovstring(oath, generator)

def create_oath_session(oath_key_dict):
    oath = OAuth1Session(
    oath_key_dict['consumer_key'],
    oath_key_dict['consumer_secret'],
    oath_key_dict['access_token'],
    oath_key_dict['access_token_secret']
    )
    return oath

def tweetmarkovstring(oath, generator):
    url = 'https://api.twitter.com/1.1/statuses/update.json'
    markovstring = generator.generate()
    params = {'status': markovstring+'[Is a fake]'}
    req = oath.post(url, params)

    if req.status_code == 200:
        print('tweet succeed!')
    else:
        print('tweet failed')


if __name__ == '__main__':
    markovbot()

This time, tweets were automatically generated based on my account (@ hitsumabushi845). Below are the results. スクリーンショット 2017-08-21 12.31.31.png

It seems to be quite a sentence ... Or rather, it was a sentence that was dangerous if you did not add some identifier, so in the above program, [fake] is added at the end.

After that, if you run this regularly with a server or cron, you will be able to Twitter forever.

References -[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning -Markov chain from morphological analysis, let's make a Twitter bot base with Python -Automatically create sentences using Markov chains in Python3 series, MeCab

Recommended Posts

Tweet your own sentences using Markov chains
[Python] Implement your own list-like class using collections.UserList
Call your own C library with Go using cgo
Create your own exception