Introduction

I wanted to delete the past Peing answer posts on Twitter, but since there were nearly 1000 deletion targets, I gave up manually deleting them, and instead wrote a script that automatically deletes the target tweets. It was.

About Tweepy

Tweepy is a Python library that uses Twitter's API. You can use it to create Twitter bots and automatically like and follow. This time, I will introduce a script that automatically deletes specific tweets.

Preparation

・ Registration of Twitter API (Please refer to here) ・ Download Twitter archive data ([Please refer to here](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive?&lang=en #))) ・ Installation of Tweepy and pandas

policy

When you download the Twitter archive data, a file called tweet.js will be downloaded together. This file contains a large amount of past tweet data as shown below. ** "tweet": {** Everything after that shows the data related to one tweet, and innumerable data similar to this is configured as one file. As a policy, we aim to selectively delete tweets that contain "https://peing.net" in the ** "source": ** line of the tweet.js file below. At that time, the id assigned to each tweet is also required, so the numerical data in the ** "id_str": ** line is also extracted.

 {
  "tweet" : {
    "retweeted" : false,
    "source" : "<a href=\"https://peing.net\" rel=\"nofollow\">Peing</a>",
    "entities" : {
      "hashtags" : [ {
        "text" : "Peing",
        "indices" : [ "18", "24" ]
      }, {
        "text" : "Question box",
        "indices" : [ "25", "29" ]
      } ],
      "symbols" : [ ],
      "user_mentions" : [ ],
      "urls" : [ {
        "url" : "https://t.co/snIXxSjooH",
        "expanded_url" : "https://peing.net/ja/qs/636766292",
        "display_url" : "peing.net/ja/qs/636766292",
        "indices" : [ "30", "53" ]
      } ]
    },
    "display_text_range" : [ "0", "53" ],
    "favorite_count" : "0",
    "id_str" : "1203602228591788032",
    "truncated" : false,
    "retweet_count" : "0",
    "id" : "1203602228591788032",
    "possibly_sensitive" : false,
    "created_at" : "Sun Dec 08 09:08:27 +0000 2019",
    "favorited" : false,
    "full_text" : "It is a plump and glossy rice.\n\n#Peing #Question box https://t.co/snIXxSjooH",
    "lang" : "ja"
  }

code

import

Since the regular expression is used when extracting the character string of the tweet data to be deleted from the tweet.js file, import the ** re ** module. It also imports ** pandas ** to create a data frame from the extracted data. ** datetime ** is not required as I personally import it to measure how long it runs. ** tweepy ** is of course required.


import re
import pandas as pd
from datetime import datetime
import tweepy

Extraction of tweet data

Define a function that extracts the required data (** "source": **, ** "id_str": **) from tweet.js and outputs it as a data frame.

def read_tweet_file(file):
    """
    reads a tweet.js into a pd.DataFrame
    """
    # tweet.Read js file
    with open(file) as dataFile:
        datalines = dataFile.readlines()
        #Creating an empty data frame to store the extracted data
        colname = ['source', 'id']
        df = pd.DataFrame([], columns=colname)
        #Specify the part to be extracted in the list
        regexes = [r'    \"source\".*', r'    \"id_str\".*' ]
        for i, regex in enumerate(regexes):
            L = []
            for line in datalines:
                #Extract the part that matches the conditions
                match_obj = re.match(regex, line)
                if match_obj :
                    L.append(match_obj.group())
            #Store in data frame
            df[colname[i]] = pd.Series(L)

        return df

Extraction of tweets to be deleted

Define a function that outputs the ID of the tweet to be erased from the data frame.

def extract_id(df):
    target_id = []
    for i in range(len(df)):
        #Extract only peing tweets from the data frame
        match_obj = re.search(r'https://peing.net', df['source'][i])
        if match_obj:
            #Output the tweet ID to be deleted as a list
            target_id.append(int(re.search(r'[0-9]+', df['id'][i]).group()))

    return target_id

Delete tweets

Specify the tweet ID and define the output function to delete the tweet.

def delete_tweets(target_id):
    delete_count = 0
    for status_id in target_id:
        try:
            #Delete tweets
            api.destroy_status(status_id)
            print(status_id, 'deleted!')
            delete_count += 1
        except:
            print(status_id, 'deletion failed.')
    print(delete_count, 'tweets deleted.')

Run

Executes the function defined above.

#Authentication to access the Twitter API
auth = tweepy.OAuthHandler('*API key*', '*API secret key*')
auth.set_access_token('*Access token*', '*Access token secret*')

api = tweepy.API(auth)
user = api.me()

#Run
print(datetime.now())
df = read_tweet_file('tweet.js')
target_id = extract_id(df)
delete_tweets(target_id)
print(datetime.now())

result

I was able to automatically delete the target tweets of 976. (Execution time is about 10 minutes)

2020-02-07 17:24:57.816773
1204021701639426048 deleted!
1204020924015472640 deleted!
1204020044683833344 deleted!
1203904952684302337 deleted!

... (Omitted) ...

1204025368052523014 deleted!
1204023316488560640 deleted!
1204023315221733376 deleted!
1204022282311499776 deleted!
976 tweets deleted.
2020-02-07 17:35:16.302221

at the end

Feel free to play with the code introduced here and have a fulfilling life with the chords. Thank you for reading. Well then!

[PYTHON] How to selectively delete past tweets with Tweepy