[PYTHON] How to selectively delete past tweets with Tweepy

Introduction

I wanted to delete the past Peing answer posts on Twitter, but since there were nearly 1000 deletion targets, I gave up manually deleting them, and instead wrote a script that automatically deletes the target tweets. It was.

About Tweepy

Tweepy is a Python library that uses Twitter's API. You can use it to create Twitter bots and automatically like and follow. This time, I will introduce a script that automatically deletes specific tweets.

Preparation

・ Registration of Twitter API (Please refer to here) ・ Download Twitter archive data ([Please refer to here](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive?&lang=en #))) ・ Installation of Tweepy and pandas

policy

When you download the Twitter archive data, a file called tweet.js will be downloaded together. This file contains a large amount of past tweet data as shown below. ** "tweet": {** Everything after that shows the data related to one tweet, and innumerable data similar to this is configured as one file. As a policy, we aim to selectively delete tweets that contain "https://peing.net" in the ** "source": ** line of the tweet.js file below. At that time, the id assigned to each tweet is also required, so the numerical data in the ** "id_str": ** line is also extracted.

 {
  "tweet" : {
    "retweeted" : false,
    "source" : "<a href=\"https://peing.net\" rel=\"nofollow\">Peing</a>",
    "entities" : {
      "hashtags" : [ {
        "text" : "Peing",
        "indices" : [ "18", "24" ]
      }, {
        "text" : "Question box",
        "indices" : [ "25", "29" ]
      } ],
      "symbols" : [ ],
      "user_mentions" : [ ],
      "urls" : [ {
        "url" : "https://t.co/snIXxSjooH",
        "expanded_url" : "https://peing.net/ja/qs/636766292",
        "display_url" : "peing.net/ja/qs/636766292",
        "indices" : [ "30", "53" ]
      } ]
    },
    "display_text_range" : [ "0", "53" ],
    "favorite_count" : "0",
    "id_str" : "1203602228591788032",
    "truncated" : false,
    "retweet_count" : "0",
    "id" : "1203602228591788032",
    "possibly_sensitive" : false,
    "created_at" : "Sun Dec 08 09:08:27 +0000 2019",
    "favorited" : false,
    "full_text" : "It is a plump and glossy rice.\n\n#Peing #Question box https://t.co/snIXxSjooH",
    "lang" : "ja"
  }

code

import

Since the regular expression is used when extracting the character string of the tweet data to be deleted from the tweet.js file, import the ** re ** module. It also imports ** pandas ** to create a data frame from the extracted data. ** datetime ** is not required as I personally import it to measure how long it runs. ** tweepy ** is of course required.


import re
import pandas as pd
from datetime import datetime
import tweepy

Extraction of tweet data

Define a function that extracts the required data (** "source": **, ** "id_str": **) from tweet.js and outputs it as a data frame.

def read_tweet_file(file):
    """
    reads a tweet.js into a pd.DataFrame
    """
    # tweet.Read js file
    with open(file) as dataFile:
        datalines = dataFile.readlines()
        #Creating an empty data frame to store the extracted data
        colname = ['source', 'id']
        df = pd.DataFrame([], columns=colname)
        #Specify the part to be extracted in the list
        regexes = [r'    \"source\".*', r'    \"id_str\".*' ]
        for i, regex in enumerate(regexes):
            L = []
            for line in datalines:
                #Extract the part that matches the conditions
                match_obj = re.match(regex, line)
                if match_obj :
                    L.append(match_obj.group())
            #Store in data frame
            df[colname[i]] = pd.Series(L)

        return df

Extraction of tweets to be deleted

Define a function that outputs the ID of the tweet to be erased from the data frame.

def extract_id(df):
    target_id = []
    for i in range(len(df)):
        #Extract only peing tweets from the data frame
        match_obj = re.search(r'https://peing.net', df['source'][i])
        if match_obj:
            #Output the tweet ID to be deleted as a list
            target_id.append(int(re.search(r'[0-9]+', df['id'][i]).group()))

    return target_id

Delete tweets

Specify the tweet ID and define the output function to delete the tweet.

def delete_tweets(target_id):
    delete_count = 0
    for status_id in target_id:
        try:
            #Delete tweets
            api.destroy_status(status_id)
            print(status_id, 'deleted!')
            delete_count += 1
        except:
            print(status_id, 'deletion failed.')
    print(delete_count, 'tweets deleted.')

Run

Executes the function defined above.

#Authentication to access the Twitter API
auth = tweepy.OAuthHandler('*API key*', '*API secret key*')
auth.set_access_token('*Access token*', '*Access token secret*')

api = tweepy.API(auth)
user = api.me()

#Run
print(datetime.now())
df = read_tweet_file('tweet.js')
target_id = extract_id(df)
delete_tweets(target_id)
print(datetime.now())

result

I was able to automatically delete the target tweets of 976. (Execution time is about 10 minutes)

2020-02-07 17:24:57.816773
1204021701639426048 deleted!
1204020924015472640 deleted!
1204020044683833344 deleted!
1203904952684302337 deleted!

... (Omitted) ...

1204025368052523014 deleted!
1204023316488560640 deleted!
1204023315221733376 deleted!
1204022282311499776 deleted!
976 tweets deleted.
2020-02-07 17:35:16.302221

at the end

Feel free to play with the code introduced here and have a fulfilling life with the chords. Thank you for reading. Well then!

Recommended Posts

How to selectively delete past tweets with Tweepy
How to Delete with SQLAlchemy?
How to cancel RT with tweepy
Get replies to specific tweets with tweepy
How to delete log with Docker, not to collect log
Get Tweets with Tweepy
How to delete the specified string with the sed command! !! !!
How to update with SQLAlchemy?
How to cast with Theano
How to Alter with SQLAlchemy?
How to separate strings with','
How to RDP with Fedora31
Python: How to use async with
How to use virtualenv with PowerShell
How to deal with imbalanced data
How to install python-pip with ubuntu20.04LTS
How to deal with imbalanced data
How to create / delete symbolic links
How to get started with Scrapy
How to get started with Python
How to deal with DistributionNotFound errors
How to get started with Django
How to Data Augmentation with PyTorch
How to use FTP with Python
How to delete a Docker container
How to calculate date with python
How to install mysql-connector with pip3
How to INNER JOIN with SQLAlchemy
How to install Anaconda with pyenv
How to authenticate with Django Part 2
How to authenticate with Django Part 3
How to do arithmetic with Django template
[Blender] How to set shape_key with script
How to title multiple figures with matplotlib
I tried to delete bad tweets regularly with AWS Lambda + Twitter API
How to get parent id with sqlalchemy
Exclude tweets containing URLs with tweepy [Python]
How to add a package with PyCharm
How to delete expired sessions in Django
How to install DLIB with 2020 / CUDA enabled
How to use ManyToManyField with Django's Admin
How to use OpenVPN with Ubuntu 18.04.3 LTS
How to use Cmder with PyCharm (Windows)
How to use Tweepy ~ Part 1 ~ [Getting Tweet]
How to prevent package updates with apt
How to work with BigQuery in Python
How to use Ass / Alembic with HtoA
How to deal with enum compatibility errors
How to use Japanese with NLTK plot
How to do portmanteau test with python
How to search Google Drive with Google Colaboratory
How to display python Japanese with lolipop
How to download youtube videos with youtube-dl
How to use jupyter notebook with ABCI
How to power off Linux with Ultra96-V2
"How to pass PATH" to learn with homebrew
How to scrape websites created with SPA
How to use CUT command (with sample)
How to enter Japanese with Python curses
[Python] How to deal with module errors
How to install zsh (with .zshrc customization)