Do you have a time when you want to boost the Twitter trend?
In such a case, let's create a bot that automatically tweets by generating sentences from a specific tag in Python.
This time, I will omit the part of fetching sentences from Twitter and making sentences. Also, I have updated it about 3 times, The very first one is easy, so I'll explain it.
First, we need to extract the noun from Japanese. This time we will use a parsing library called janome
.
janome
is a tool that breaks down Japanese into syntax
from janome.tokenizer import Tokenizer
tokenizer = Tokenizer()
sentence = 'Today's guest was Mr. Wada.'
for token in tokenizer.tokenize(sentence):
print(token)
#Today noun,Adverbs possible,*,*,*,*,today,Hongjitsu,Hongjitsu
#Particles,Attributive,*,*,*,*,of,No,No
#Guest noun,General,*,*,*,*,The guests,The guests,The guests
#Is a particle,Particle,*,*,*,*,Is,C,Wow
#Wada noun,Proper noun,Personal name,Surname,*,*,Wada,Wada,Wada
#San noun,suffix,Personal name,*,*,*,Mr.,Sun,Sun
#Deshi auxiliary verb,*,*,*,Special Death,Continuous form,is,Deci,Deci
#Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
#.. symbol,Kuten,*,*,*,*,。,。,。
It decomposes Japanese as follows.
You can break down one tweet with this, but this may cause noise.
Therefore, make a sentence only when the same noun exists in multiple tweets.
Some people like this.
First, there is a class for duplicate extraction.
class DuplicateChecker:
def __init__(self, tokenier: Tokenizer):
self.twitt_nouns = []
self.tokenier = tokenier
def extract_duplications(self) -> [str]:
return [x for x in set(self.twitt_nouns) if self.twitt_nouns.count(x) > 1]
def input_twitt(self, twitt: str):
tokens = self.tokenier.tokenize(twitt)
nouns = []
buffer = None
for token in tokens:
if token.part_of_speech.count("noun"):
if buffer is None:
buffer = ""
buffer += token.surface
else:
if buffer is not None:
nouns.append(buffer)
buffer = None
self.twitt_nouns.extend(nouns)
Now, when I check for duplicates, it looks like this:
tokenier = Tokenizer()
duplicateChecker = DuplicateChecker(tokenier)
duplicateChecker.input_twitt("I felt the possibility of the striped pattern")
duplicateChecker.input_twitt('I only feel the possibility')
duplicateChecker.input_twitt('Feel the possibility')
nouns = duplicateChecker.extract_duplications()
nouns # ["possibility"]
(The production was also checked by the user.)
Make a sentence from the noun extracted at the end. (This time is appropriate.)
class SentenceGenerator:
def __init__(self, nouns:[str]):
self.nouns = nouns
self.senence_base = ["{}Is grass", "{}important", "{}", "{}Ne", "{}だNe", "{}!"]
def generate(self) -> str:
index:int = int(random.uniform(0, 200)) % len(self.senence_base)
sentence = self.senence_base[index].format(self.nouns[0])
return sentence
When executed, it looks like this.
This time I tried to make something unreasonably simple. If there is a response, I will write how I updated it.
Recommended Posts