[PYTHON] Create a bot that boosts Twitter trends

Do you have a time when you want to boost the Twitter trend?

In such a case, let's create a bot that automatically tweets by generating sentences from a specific tag in Python.

This time, I will omit the part of fetching sentences from Twitter and making sentences. Also, I have updated it about 3 times, The very first one is easy, so I'll explain it.

Parse

First, we need to extract the noun from Japanese. This time we will use a parsing library called janome.

janome is a tool that breaks down Japanese into syntax

from janome.tokenizer import Tokenizer

tokenizer = Tokenizer()

sentence = 'Today's guest was Mr. Wada.'

for token in tokenizer.tokenize(sentence):
    print(token)
    
#Today noun,Adverbs possible,*,*,*,*,today,Hongjitsu,Hongjitsu
#Particles,Attributive,*,*,*,*,of,No,No
#Guest noun,General,*,*,*,*,The guests,The guests,The guests
#Is a particle,Particle,*,*,*,*,Is,C,Wow
#Wada noun,Proper noun,Personal name,Surname,*,*,Wada,Wada,Wada
#San noun,suffix,Personal name,*,*,*,Mr.,Sun,Sun
#Deshi auxiliary verb,*,*,*,Special Death,Continuous form,is,Deci,Deci
#Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
#.. symbol,Kuten,*,*,*,*,。,。,。

It decomposes Japanese as follows.

Check for duplicates

You can break down one tweet with this, but this may cause noise.

Therefore, make a sentence only when the same noun exists in multiple tweets.

Some people like this.

First, there is a class for duplicate extraction.

class DuplicateChecker:
    
    def __init__(self, tokenier: Tokenizer):
        self.twitt_nouns = []
        self.tokenier = tokenier

    def extract_duplications(self) -> [str]:
        return [x for x in set(self.twitt_nouns) if self.twitt_nouns.count(x) > 1]

    def input_twitt(self, twitt: str):
        tokens = self.tokenier.tokenize(twitt)
        
        nouns = []
        buffer = None
        for token in tokens:
            if token.part_of_speech.count("noun"):
                if buffer is None: 
                    buffer = ""
                buffer += token.surface
            else:
                if buffer is not None:
                    nouns.append(buffer)
                buffer = None
                    
                    
        self.twitt_nouns.extend(nouns)

Now, when I check for duplicates, it looks like this:

tokenier = Tokenizer()
duplicateChecker = DuplicateChecker(tokenier)

duplicateChecker.input_twitt("I felt the possibility of the striped pattern")
duplicateChecker.input_twitt('I only feel the possibility')
duplicateChecker.input_twitt('Feel the possibility')

nouns = duplicateChecker.extract_duplications()
nouns # ["possibility"]

(The production was also checked by the user.)

Sentence generation

Make a sentence from the noun extracted at the end. (This time is appropriate.)

class SentenceGenerator:
    
    def __init__(self, nouns:[str]):
        self.nouns = nouns
        self.senence_base = ["{}Is grass", "{}important", "{}", "{}Ne", "{}だNe", "{}!"]
        
    def generate(self) -> str:
        index:int = int(random.uniform(0, 200)) % len(self.senence_base)
        sentence = self.senence_base[index].format(self.nouns[0])
        return sentence

When executed, it looks like this. スクリーンショット 2.png

Summary

This time I tried to make something unreasonably simple. If there is a response, I will write how I updated it.