Perform a Twitter search from Python and try to generate sentences with Markov chains.

Get tweets with any keyword using the Twitter API. Save the acquired data as text and pass it to MeCab. Perform morphological analysis and try to make sentences with Markov chains.

This time I kept the count to 140 so that it can be used for tweets as it is, You may try to make it long. The accuracy is low. I just realized the awesomeness of the compressed newspaper.

markov.py



#!/user/bin/env python
# -*- coding: utf-8 -*-
from requests_oauthlib import OAuth1Session
import json
import sys
import MeCab
import random


while True:
	search_words = raw_input(u"Keyword?: ")
	
	C_KEY = "******************************"
	C_SECRET = "******************************"
	A_KEY = "******************************"
	A_SECRET = "******************************"


	def Limit_Status():
		url = "https://api.twitter.com/1.1/application/rate_limit_status.json"
		params = {}
		tw = OAuth1Session(C_KEY,C_SECRET,A_KEY,A_SECRET)
		req = tw.get(url, params = params)
		if req.status_code == 200:
			limit = req.headers["x-rate-limit-remaining"]
			print ("API remain: " + limit)
		return Limit_Status
	
	def Search_words():
		url = "https://api.twitter.com/1.1/search/tweets.json?"
		params = {
				"q": unicode(search_words, "utf-8"),
				"lang": "ja",
				"result_type": "recent",
				"count": "100"
				}
		tw = OAuth1Session(C_KEY,C_SECRET,A_KEY,A_SECRET)
		req = tw.get(url, params = params)
		tweets = json.loads(req.text)
		for tweet in tweets["statuses"]:
			f = open("test.txt" , "aw")
			lists = (tweet["text"].encode("utf-8"))
			if "http" in lists:
				lists = lists.split("http", 1)[0]
				lists = lists.split("@")[0]
				lists = lists.split("RT")[0]

				f.write(lists)
				f.flush()
				f.close()

		
	def Mecab_file():	
		f = open("test.txt","rb")
		data = f.read()
		f.close()

		mt = MeCab.Tagger("-Owakati")
		wordlist = mt.parse(data)
 
		markov = {}
		w1 = ""
		w2 = ""
		w3 = ""
		w4 = ""
		w5 = ""
		w6 = ""
		w7 = ""
		w8 = ""
		for word in wordlist:
			if w1 and w2 and w3 and w4 and w5 and w6 and w7 and w8:
				if (w1,w2,w3,w4,w5,w6,w7,w8) not in markov:
					markov[(w1,w2,w3,w4,w5,w6,w7,w8)] = []
				markov[(w1,w2,w3,w4,w5,w6,w7,w8)].append(word)
			w1,w2,w3,w4,w5,w6,w7,w8 = w2,w3,w4,w5,w6,w7,w8,word
		count = 0
		sentence = ""
		w1,w2,w3,w4,w5,w6,w7,w8 = random.choice(markov.keys())
    
		while count < 140:
			if markov.has_key((w1,w2,w3,w4,w5,w6,w7,w8)) == True:
				tmp = random.choice(markov[(w1,w2,w3,w4,w5,w6,w7,w8)])
				sentence += tmp
				w1,w2,w3,w4,w5,w6,w7,w8 = w2,w3,w4,w5,w6,w7,w8,tmp
				count +=1
			if " " in sentence:
				sentence = sentence.split(" ", 1)[0]
				
		print sentence
	
	if search_words:
		Search_words()
		Mecab_file()
		Limit_Status()
	else:
		break

I tried to operate with 8 chains. It turned out that it wasn't interesting unless I stopped at about 4 chains.

Originally I wanted to remove all unnecessary data from Json data, but my knowledge at the moment is limited. For the time being, when http is included in the text, I tried to remove it with split.

As usual, if test.txt does not exist in the same directory, it will be generated. If there is, overwrite it.

The loop in While breaks when executed without entering a search word. It may be good to store various search words separately.

I tried to edit it. Sharpen the parts that you don't need with regular expressions, Random choices such as "desu" and "masu" so that the end of the sentence does not get strange.

I felt that this one was more practical.

	def Mecab_file():	
		f = open("tweet.txt","rb")
		data = f.read()
		f.close()

		mt = MeCab.Tagger("-Owakati")

		wordlist = mt.parse(data)
		wordlist = wordlist.rstrip(" \n").split(" ")
 
		markov = {}
		w = ""
	
		for x in wordlist:
			if w:
				if markov.has_key(w):
					new_list = markov[w]
				else:
					new_list =[]
			
				new_list.append(x)
				markov[w] = new_list
			w = x
		
		choice_words = wordlist[0]
		sentence = ""
		count = 0
	
		while count < 90:
			sentence += choice_words
			choice_words = random.choice(markov[choice_words])
			count += 1

			sentence = sentence.split(" ", 1)[0]
			p = re.compile("[!-/:-@[-`{-~]")
			sus = p.sub("", sentence)
		
			random_words_list = [u"。", u"is.", u"It is."]
			last_word = random.choice(random_words_list)
	
		print re.sub(re.compile("[!-~]"),"",sus), last_word

Recommended Posts

Perform a Twitter search from Python and try to generate sentences with Markov chains.
Try to generate a cyclic peptide from an amino acid sequence with Python and RDKit
WEB scraping with python and try to make a word cloud from reviews
Try to bring up a subwindow with PyQt5 and Python
Try to extract a character string from an image with Python3
[Python] Try to recognize characters from images with OpenCV and pyocr
Try to draw a life curve with python
How to generate a Python object from JSON
Try to make a "cryptanalysis" cipher with Python
Try to automatically generate Python documents with Sphinx
Steps to create a Twitter bot with python
Try to make a dihedral group with Python
Try to beautify with Talking Head Anime from a Single Image [python preparation]
Try to make a command standby tool with python
Try to operate DB with Python and visualize with d3
Crawling with Python and Twitter API 1-Simple search function
From buying a computer to running a program with python
I tried to automatically generate a password with Python3
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Search twitter tweets with python
Try hitting the Twitter API quickly and easily with Python
Hash with python and escape from a certain minister's egosa
Collecting information from Twitter with Python (MySQL and Python work together)
Just try to receive a webhook in ngrok and python
A learning roadmap that allows you to develop and publish services from scratch with Python
I made a server with Python socket and ssl and tried to access it from a browser
Try morphological analysis and Markov chains with Django (Ari with a lot of room for improvement)
Repeat with While. Scripts to Tweet and search from the terminal
Create folders from '01' to '12' with python
Try to operate Facebook with Python
[Lambda] [Python] Post to Twitter from Lambda!
Try to create a python environment with Visual Studio Code & WSL
How to make a surveillance camera (Security Camera) with Opencv and Python
Crawling with Python and Twitter API 2-Implementation of user search function
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
I want to do a full text search with elasticsearch + python
I tried to make a periodical process with Selenium and Python
Try to display google map and geospatial information authority map with python
[Python] Generate a password with Slackbot
Easily post to twitter with Python 3
Pass a list by reference from Python to C ++ with pybind11
Try adding a wall to your IFC file with IfcOpenShell python
[Python] A data infrastructure for acquiring and publishing tweets from Twitter API to BigQuery was built on GCP (with negative / positive score).
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Introduction and usage of Python bottle ・ Try to set up a simple web server with login function
Information is accumulated by Twitter search, morphological analysis is performed, sentences are generated by Markov chains, and tweets are made.
2. Make a decision tree from 0 with Python and understand it (2. Python program basics)
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
[Python] Created a Twitter bot that generates friend-like tweets using Markov chains
Try converting latitude / longitude and world coordinates to each other with python
Try to make a capture software with as high accuracy as possible with python (2)
Create a tool to automatically furigana with html using Mecab from Python3
Solving with Ruby, Perl, Java, and Python AtCoder AGC 033 A Breadth-first search
Try to make foldl and foldr with Python: lambda. Also time measurement
Try tweeting arXiv's RSS feed on twitter from Raspberry Pi with python
Migration from Python2 to Python3 (Python2 is rebuilt as a virtual environment and coexists)
Search for Twitter keywords with tweepy and write the results to Excel
Connect to postgreSQL from Python and use stored procedures in a loop.
PPLM: A simple deep learning technique to generate sentences with specified attributes
Try to write python code to generate go code --Try porting JSON-to-Go and so on