[PYTHON] [Memorandum] ① Get and save tweets ~ I want to identify the news tweets that are spread ~

Development environment

Windows10 Anaconda3 ( jupyter notebook )

Description and purpose

A memorandum of graduation thesis of a university student The theme is to create a discriminator between what is spread and what is not spread in news tweets. This time, I am writing about getting Tweet in it.

Prerequisites

・ Tweet Developer certified ・ Tweepy installed

reference

https://qiita.com/i_am_miko/items/a2e5168e619ed37afeb9

Get Tweets

The account to get is @livedoornews. The reason is that it excels in the number of followers and the sensitivity of those followers (whether to improve RT).

get_newstweet.ipynb


#Import the required libraries
import tweepy
import pandas as pd

get_newstweet.ipynb



#Consumer key and access token settings for using Twitter API
Consumer_key = "API key"
Consumer_secret = "API secret Key"
Access_token = "Access token"
Access_secret = "Access token secret"

#Authentication
auth = tweepy.OAuthHandler(Consumer_key,Consumer_secret)
auth.set_access_token(Access_token, Access_secret)
api = tweepy.API(auth)

get_newstweet.ipynb


#Specify account name
acount = "@livedoornews"
"""
Acquisition contents: Tweet number, time, tweet text, number of likes, number of RTs
"""
def get_tweets(acount):
    tweet_data = [] #Empty list to store the data to get
    for tweet in tweepy.Cursor(api.user_timeline,screen_name = acount,exclude_replies = True).items():
        tweet_data.append([tweet.id,tweet.created_at,tweet.text.replace('\n',''),tweet.favorite_count,tweet.retweet_count])
        df = pd.DataFrame(tweet_data,columns=['tweet_no', 'time', 'text', 'favorite_count', 'RT_count']) #Stored in pandas DataFrame
    return df

df = get_tweets(acount)

Save the retrieved tweets (csv)

If you want to continue taking tweets with the above function, you need to save additional. Therefore, I made two saving methods, one for new saving and the other for additional saving.

First, save new

get_newstweet.ipynb


#Save new
file_name = "../data/tweet_{}.csv".format(acount)
df.to_csv(file_name, index=False) #index is often not needed
Second, overwrite save

get_newstweet.ipynb


#overwrite save
file_name = "../data/tweet_{}.csv".format(acount)
pre_df = pd.read_csv(file_name) #Load the previous csv
df = pd.concat([df, pre_df])
df = df.drop_duplicates(subset=['tweet_no']) #Delete duplicates with Tweet No.(Leave the new data)
df.to_csv(file_name, index=False)

Summary and next content

That's all for getting tweets and saving them. I think there is a better way to save new or overwrite. Next time, I would like to delete RT and URL.

Recommended Posts

[Memorandum] ① Get and save tweets ~ I want to identify the news tweets that are spread ~
I want to identify the alert email. --Is that x a wildcard? ---
I want to visualize where and how many people are in the factory
I want to get the file name, line number, and function name in Python 3.4
I want to get the operation information of yahoo route
I want to map the EDINET code and securities number
Keras I want to get the output of any layer !!
I want to get information from fstab at the ssh connection destination and execute a command
I want to get the name of the function / method being executed
I want to record the execution time and keep a log.
Bug that unnecessary files are created when the -i and -e options are added to the sed command.
I want to connect remotely to another computer, and the nautilus command
[For beginners] I want to get the index of an element that satisfies a certain conditional expression
Memorandum Regular expression When there are multiple characters in the character string that you want to separate
I want to separate the processing between test time and production environment
Get tweets with Google Cloud Function and automatically save images to Google Photos
I want to analyze the emotions of people who want to meet and tremble
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I want to pin Spyder to the taskbar
I want to output to the console coolly
I want to handle the rhyme part1
I want to handle the rhyme part3
I want to display the progress bar
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
The file edited with vim was readonly but I want to save it
I want to get the path of the directory where the running file is stored.
python I don't know how to get the printer name that I usually use.
The story of IPv6 address that I want to keep at a minimum
I want to drop a file on tkinter and get its path [Tkinter DnD2]
Python programming: I tried to get (crawling) news articles using Selenium and BeautifulSoup4.
I want to make a music player and file music at the same time
I tried to summarize the operations that are likely to be used with numpy-stl
I want to exe and distribute a program that resizes images Python3 + pyinstaller
I tried to save the data with discord
I want to handle the rhyme part7 (BOW)
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to customize the appearance of zabbix
I want to get League of Legends data ①
I want to use the activation function Mish
I want to display the progress in Python!
[LPIC 101] I tried to summarize the command options that are easy to make a mistake
The story of Linux that I want to teach myself half a year ago
I want to get started with the Linux kernel, what is the list head structure?
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
[Pyhton] I want to solve the problem that tkinter does not work on MacOS11
I want to replace the variables in the python template file and mass-produce it in another file.
A solution to the problem that files containing [and] are not listed in glob.glob ()
I want to cut out only the face from a person image with Python and save it ~ Face detection and trimming with face_recognition ~
I want to see the file name from DataLoader
Get the title of yahoo news and analyze sentiment
I tried to read and save automatically with VOICEROID2 2
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to grep the execution result of strace
I want to scroll the Django shift table, but ...
I want to handle optimization with python and cplex
I want to inherit to the back with python dataclass
I want to fully understand the basics of Bokeh
I want to write in Python! (3) Utilize the mock