[PYTHON] Delete tweets for the specified period

I want to say that there was no black history

There are already various tools in the world, According to Twitter specifications, only the latest 3200 items can be deleted. There is a charge for mass deletion. I am in great trouble.

The number of trash-like tweets that have been accumulated since 2010 is well over 100,000.

There is also a source in Qiita that erases all tweets if you look for it. But everything is in trouble. In the first place, if you want to delete everything, it would be faster to recreate the account ...

I want to erase only the black history and keep the recent true human being.

For such people.

procedure

  1. Building a python3.6 environment
  2. Obtaining an archive full of black history
  3. Edit tweet.js in the archive
  4. Get API key
  5. Edit source code
  6. Run
  7. Done

1. Building a python3.6 environment

Of course you can use it, right? If you can't use it, ask Google Sensei

2. Obtaining an archive full of black history

① Log in to Twitter on your PC and open the settings screen (https://twitter.com/settings/account) ② Select the "Twitter data" tab (enter the password when prompted) ③ Press "Download Twitter data".

3. Edit tweet.js

I want to read this file in the source code, so I'll modify it a little.

① Open the "data" folder in the unzipped folder (2) There is a file named "tweet.js", so open it with a suitable text editor. ③ Delete the character string "window.YTD.tweet.part0 =" at the beginning. ④ Save and close.

4. Get API key

You need an access token to use the Twitter API. Well, it's just a string. Follow the steps below to get 2 keys and 2 tokens. It is necessary for the program, so you can copy it to Notepad.

① Access here → https://developer.twitter.com/en/apps (2) There may be some Apps in it, but select "Details" although the Twitter account ID is listed. ③ Confirm that there are three tabs, "App details", "Keys and tokens", and "Permissions".

④ First, select "Permissions" and press the "Edit" button to change the permissions. ⑤ The authority is "Read, write, and Direct Messages". ⑥ Press Save to save.

⑦ Open "Keys and tokens" ⑧ Make a note of the following in "Keys, secret keys and access tokens management." ・ API key ・ API secret key ⑨ Copy the information of "Access token & access token secret" When you press the "Regenerator" button ・ Ass and Ken ・ Access token secret Will be displayed, so make a note of each

This is OK

5. Edit source code

The source sample is below I wrote in the comments where I want you to change each one, so please do as you like


import json
import twitter # pip install python-twitter

# ======Please write the settings in a nice way====== 

#The key and token are listed below
api_key             = 'Copy here', # メモした「API key」をCopy here
api_secret_key      = 'Copy here', # メモした「API secret key」をCopy here
access_token        = 'Copy here', # メモした「Access token」をCopy here
access_token_secret = 'Copy here', # メモした「Access token secret」をCopy here

# tewwt.Describe the file path of js.("\"Be sure to attach two)
js_file_path = "D:\\sample\\hogehoge\\tweet.js" 

#Delete tweets during the period from begin to end specified below. (Including begin and end dates)
begin_year  = 2010 #This year
begin_month = 1    #Of this month
begin_day   = 1    #From this day ↓
end_year  = 2019 #This year
end_month = 12   #Of this month
end_day   = 31   #Delete until this date

# ================================ 

api = twitter.Api(
    consumer_key        = api_key,
    consumer_secret     = api_secret_key,
    access_token_key    = access_token,
    access_token_secret = access_token_secret,
    sleep_on_rate_limit = True
)

class date():
    def __init__(self, y, m, d):
        self.y = y
        self.m = m
        self.d = d

class date_range():
    def __init__(self):
        self.begin = date(begin_year, begin_month, begin_day)
        self.end   = date(end_year, end_month, end_day)

#I think there is a more efficient way, but if you care, you lose
def cnv_month_from_str2int(month):
    if month =='Jan':
        return 1
    elif month =='Feb':
        return 2
    elif month =='Mar':
        return 3
    elif month =='Apr':
        return 4
    elif month =='May':
        return 5
    elif month =='Jun':
        return 6
    elif month =='Jul':
        return 7
    elif month =='Aug':
        return 8
    elif month =='Sep':
        return 9
    elif month =='Oct':
        return 10
    elif month =='Nov':
        return 11
    elif month =='Dec':
        return 12
    else:
        assert False, "ERROR!![{}] is not month".format(month)

def run():
    d_r = date_range()
    cnt = 0
    with open(js_file_path, encoding='utf-8', mode='r') as f:
        tj=json.load(f)
        for tweet0 in tj:
            tweet = tweet0['tweet']
            print()
            print(tweet['id'])

            date = tweet['created_at']
            dow, month, day, time, other, year = date.split()
            _year = int(year)
            _day = int(day)
            _month = cnv_month_from_str2int(month)
            
            # out of custum date range.
            if ( _year > d_r.begin.y  and d_r.end.y < _year) \
            or (_month > d_r.begin.m and d_r.end.m < _month) \
            or (_day > d_r.begin.d   and d_r.end.d < _day):
                continue

            print("The number that deleted tweet is {}".format(cnt))
            print("Now deleting {}/{}/{}".format(_year, _month, _day) )

            try:
                api.DestroyStatus(tweet['id'])
                cnt += 1
            except Exception as e:
                # Error if already deleted or tweet is RT
                print(e.args)
    return cnt

if __name__ == '__main__':
    dl_cnt = run()
    print()
    print("Finish!!")
    print("Deleted {} tweets".format(dl_cnt))


6. Run

When executed, deletion starts. You can see that the number of tweets decreases if you open your account and hit updates repeatedly. It takes a long time to delete it, so be patient. (If you delete tens of thousands of items, it will not end in an hour or something like that)

7. Done.

If finish is displayed, it should be completed. This should have almost completely erased the history of black. I did it.

Non-correspondence matter

By the way, RT cannot be canceled ... I haven't looked at the information in tweet.js in detail, but there should be something that can be judged as RT. After that, you should find the API to cancel RT from the official document and hit it. It was troublesome to find out. ..

If you want to delete only tweets with images, you should be able to do it.

If you want to do it, check the contents of .js and API and rewrite it yourself.

reference

https://qiita.com/aeas44/items/a5b82da69b64b32aada4

Recommended Posts

Delete tweets for the specified period
Delete the substring
Delete all pyc files under the specified directory
How to delete the specified string with the sed command! !! !!
[Python] Create a date and time list for a specified period
Kaggle for the first time (kaggle ①)
Kaguru for the first time
What is the interface for ...