Use Search Tweets: Full Archive / Sandbox in Python

Previous article I started Python for my thesis and tried to do various things with the Twitter API. Python history is no longer 10 days. Well, my skills haven't improved so much, but ...

This premise: Are all searches on the Twitter official website displayed?

In the end, the data collection for the thesis was a search on the official Twitter website, which has full access to past tweets. Primitive ~. Well, I was taking the number of data in that way, but as I go back in the past, the number of search results for that keyword will decrease ** more than I expected.

An unpleasant premonition that crosses my head there.

** "Maybe all the searches on the Twitter official website are not displayed ...?" **

Certainly, it is not unthinkable that the more past tweets are, the lower the browsing needs will be, and the search results will be thinned out. There is no quantitative analysis or shit with this. It is a crisis of thesis. (; ・ ∀ ・)

Search specifications on the official Twitter website

Search on the official Twitter website (hereinafter referred to as Twitter search) has several search result display formats. 1.PNG A common focus is the difference between "topic tweets" and "latest." (In the past, there was a time when "topic tweets" were called "topics" and "latest" was called "all tweets.")

According to the official Twitter announcement ("Frequently Asked Questions about Search Results")

** Why doesn't my favorite tweet appear in the topical tweets? ** ** Topic tweets are the ones most relevant to your search. Twitter uses a number of factors to determine relevance, including the popularity of tweets (many people respond and share through retweets, replies, etc.), the keywords they contain, and more. If your favorite tweet doesn't appear in the topical tweet, it means that the tweet isn't very relevant to your search. To see recent tweets that match your search terms, click or tap All Tweets (Latest).

Of course, this data acquisition also used the "latest" tab. It used to be called "all tweets", so I thought that all tweets would be displayed ... but at the moment, it is called "latest" and "all tweets are displayed" in the official announcement. There was no clear announcement saying, "I'm doing it," so I can't confirm it.

What I did this time

Therefore, I decided to use the free trial version (Search Tweets: Full Archive / Sandbox) of the Twitter API to verify that all the results of the Twitter search are displayed. Due to the limit on the number of requests (50 requests / month for Sandbox), quantitative search is not possible, but it is good if you can search several times for verification. Also, I couldn't find a comprehensive Japanese article about Search Tweets: Full Archive / Sandbox in my view, so I'll try to summarize it (although I'm extremely overwhelmed). Obtaining a Twitter developer account has already been completed.

① Development environment setting on Dashboard

First of all, it seems that you have to log in with a developer account and set up the development environment of Search Tweets: Full Archive / Sandbox from Dashboard. 2.PNG In the above figure, nothing is displayed because the settings have already been completed, but in the initial state, "You must first set up a dev environment before accessing an endpoint and viewing usage." Is displayed at the bottom. I am. Click "set up a dev environment" and set the "Dev environment label" and which app to use Search Tweets: Full Archive / Sandbox. 3.png The Dev environment label (development in this case) will be used later, and the App Name should be the same as the consumer key used for authentication.

② Stumble points in the code

I was wondering if the code used in the previous free version search could be used as it is, but first I had to rewrite the URL of the endpoint ( That's right). What worked well with the free version  https://api.twitter.com/1.1/search/tweets.json It was, but this time  https://api.twitter.com/1.1/tweets/search/fullarchive/development.json have become. (Of course, you can find it in Search Tweets: Premium search APIs Reference.) As mentioned above, the Dev environment label you set is also included in this URL. In this case it is the development part. Please rewrite it according to your own needs.

Also, I don't know why because I'm illiterate, but the names of the parameters used are different between the search using the Standard search API and the search using the Premium search APIs. So, it didn't work unless I rewrote that point from the code used in the free version. 5.PNG (↑ Search Tweets: Standard search API Reference) 4.PNG (↑ Search Tweets: Premium search APIs Reference)

The name of the parameter that contains the search query is " q "in the free version, while it is" query" in the Premium search APIs. Other parameters were also rewritten while looking at the reference.

It's also plain, but the name of the internal structure of json that is spit out is also a little different, so I'm rewriting it.

③ Code

Since it is based on the previous code, I will quote here again, which is the base of the previous code. Thank you very much.

Play with twitter API # 3 (Get search results)

test.py


#! python3
# -*- coding: utf-8 -*-

import json
from requests_oauthlib import OAuth1Session

#OAuth authentication part
CK      = 'Obtained Consumer key'
CS      = 'Obtained Consumer secret'
AT      = 'Obtained Access token'
ATS     = 'Obtained Access token secret'
twitter = OAuth1Session(CK, CS, AT, ATS)

# Twitter Endpoint(Get search results)
url = 'https://api.twitter.com/1.1/tweets/search/fullarchive/development.json'

#Parameters to pass to Enedpoint
keyword = '"Pikmin"'

params ={
         'query' : keyword ,  #Search keyword
         'maxResults': 20 ,   #Number of tweets to get
         'fromDate' : 201301311500 ,
         'toDate' : 201302011500 
        }

req = twitter.get(url, params = params)

if req.status_code == 200:
    res = json.loads(req.text)
    for line in res['results']:
        print(line['text'])
        print('*******************************************')
else:
    print("Failed: %d" % req.status_code)

'fromDate' and 'toDate' are optional parameters, but these are specified in UTC. It's not Japan time.

④ Result

The result of turning this code and the result of searching ["Pikmin" since: 2013-02-01_00:00:00_JST until: 2013-02-02_00:00:00_JST](https: // twitter) .com / search? q =% 22% E3% 83% 94% E3% 82% AF% E3% 83% 9F% E3% 83% B3% 22% 20since% 3A2013-02-01_00% 3A00% 3A00_JST% 20until% I compared 3A2013-02-02_00% 3A00% 3A00_JST & src = typed_query & f = live). 6.PNG I tried to enclose the tweets displayed in only one of them with a red frame. Only tweets that start with "RT @ XXXX:" are displayed in the API search results. Is it the official RT at that time ...? On the other hand, some tweets were not displayed by API search. I don't know about this anymore. why. If you know anything, I would appreciate it if you could teach me. (Maybe the code is bad, I'll leave the one that is displayed as a double at the end in the search result with API)

Maybe I can write a thesis

Well, I don't know why there are tweets that "do not appear in API search results but appear in Twitter search results", but the search results that appear in "latest" in Twitter search are retweets. It seems that everything except is displayed (although it is possible that some tweets are not displayed in both ...). So, I think that the quantitative analysis based on Twitter search has some validity. I managed to connect the skin of my neck, so I will do my best to write my thesis. (= ゜ ω ゜) No

If you have any information, I would appreciate it if you could let me know. I am still a beginner, so please point out any points that you cannot reach.

Recommended Posts

Use Search Tweets: Full Archive / Sandbox in Python
Use config.ini in Python
Use dates in Python
Binary search in Python
Use Valgrind in Python
python bit full search
Linear search in Python
Use profiler in Python
Binary search in Python (binary search)
Let's use def in python
Use let expression in Python
Use Measurement Protocol in Python
Use callback function in Python
Use parameter store in Python
Use HTTP cache in Python
Search for strings in Python
Use MongoDB ODM in Python
Use Random Forest in Python
Use regular expressions in Python
Algorithm in Python (binary search)
Use Spyder in Python IDE
Full bit search with Python
Learn exploration in Python # 1 Full exploration
Search twitter tweets with python
Scraping Google News search results in Python (2) Use Beautiful Soup
Algorithm in Python (breadth-first search, bfs)
Write a binary search in Python
Use fabric as is in python (fabric3)
How to use SQLite in Python
Algorithm in Python (depth-first search, dfs)
Use rospy with virtualenv in Python3
How to use Mysql in python
Use Python in pyenv with NeoVim
How to use ChemSpider in Python
Use OpenCV with Python 3 in Window
Solve the subset sum problem with a full search in Python
[Introduction to Python] How to use class in Python?
Use print in a Python2 lambda expression
Algorithm in Python (ABC 146 C Binary Search
Easily use your own functions in Python
What to do if you can't use scikit grid search in Python
Search and play YouTube videos in Python
Easy way to use Wikipedia in Python
In search of the fastest FizzBuzz in Python
How to use regular expressions in Python
Patch when full text search does not work in GAE / Python local environment
How to use Python Image Library in python3 series
Summary of how to use MNIST in Python
Use various rabbimq features with pika in python
Don't use readlines () in your Python for statement!
How to use tkinter with python in pyenv
Use jupyter-lab installed in python virtual environment (venv)
Use Python in Anaconda environment with VS Code
Use a custom error page in python / tornado
Use python in Docker container as Pycharm interpreter
Use the LibreOffice app in Python (3) Add library
Use pydantic when reading environment variables in Python
Use os.getenv to get environment variables in Python
Quadtree in Python --2
Python in optimization
CURL in python