[PYTHON] How to collect Twitter data without programming

greeting

Hello, this is sunfish. As the first post, I tried a corona-related tweet analysis series. I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?) I'm tired of Python, so I analyzed the data with nehan (I want to go live even with Corona-Sequel) I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with Corona sickness-Part 1)

This time, I would like to conclude with ** How to collect Twitter data in the first place **. Of course, using Analysis Tool nehan.

Use Amazon S3 as storage for storage. tweet_data_flow.png

Apply for API usage

First, you have to apply for the Twitter API. There are many ways to do this if you search on Google, so I will omit it. It's just an application process, but it's a little troublesome to write in various ways and to speak English in the first place.

Hit the Twitter API

nehan has many connectors for capturing external data. スクリーンショット 2020-10-19 15.38.46.png Since it uses Cdata drivers, it can also import Web service data. スクリーンショット 2020-10-19 15.38.55.png If you select Twitter and enter the obtained API information, you can get tweet data with SQL query. twitter1.png

Accumulate the acquired tweet data

Add some effort to the acquired data and store it in Amazon S3. Add one column to add processing time so that you can see when the data was acquired. スクリーンショット 2020-10-19 16.20.37.png This is where the variable function comes in handy. The execution time and execution date are dynamically defined. スクリーンショット 2020-10-19 16.20.06.png And finally, if you export to S3, the accumulation is completed. I put a variable in the file name to export so that I can know the processing date. スクリーンショット 2020-10-19 16.20.49.png

Process and accumulate data every day

Since it is not possible to manually execute the above process every day, set the automatic update setting. The flow that updates tweet data and stores it in S3 is set to be automatically executed at 0:00 every day. スクリーンショット 2020-10-19 16.21.57.png Sometimes the Twitter API doesn't respond and I can't get the data, so it sometimes fails. .. .. スクリーンショット 2020-10-19 16.22.14.png

Acquire and analyze the accumulated data again

Collect daily data accumulated in Amazon S3 in a batch and import it into nehan. twitter3.png I have been analyzing the data captured in this way.

Summary

While it makes me want to collect external data and multiply it with my own data, it can be very troublesome to collect. With nehan, you can directly connect to analysis as well as collection. Of course no programming required. For analysts who are tired of collecting data and writing Python, why not live a comfortable analytical life with nehan?

Recommended Posts

How to collect Twitter data without programming
How to collect machine learning data
How to handle data frames
[Python] How to FFT mp3 data
How to read e-Stat subregion data
How to collect images in Python
How to deal with imbalanced data
How to deal with imbalanced data
How to Data Augmentation with PyTorch
How to collect face images relatively easily
How to use "deque" for Python data
How to handle time series data (implementation)
How to read problem data with paiza
Dedicated to beginners! How to learn programming without spending as much money as possible
How to create sample CSV data with hypothesis
How to enjoy programming with Minecraft (Ruby, Python)
How to delete log with Docker, not to collect log
[Django] How to get data by specifying SQL.
[Python] How to read data from CIFAR-10 and CIFAR-100
How to scrape horse racing data with BeautifulSoup
How to use data analysis tools for beginners
[Introduction to Python] How to handle JSON format data
How to get article data using Qiita API
How to create data to put in CNN (Chainer)
How to read time series data in PyTorch
How to search HTML data using Beautiful Soup
Data cleaning How to handle missing and outliers
[For beginners] How to study programming Private memo
How to apply markers only to specific data in matplotlib
[For beginners] How to study Python3 data analysis exam
How to scrape image data from flickr with python
How to scrape horse racing data using pandas read_html
How to quickly create array sample data during coding
How to convert horizontally held data to vertically held data with pandas
How to get more than 1000 data with SQLAlchemy + MySQLdb
How to extract non-missing value nan data with pandas
How to perform learning in SageMaker without session timeout
Try to divide twitter data into SPAM and HAM
I thought about how to learn programming for free.
How to enjoy Python on Android !! Programming on the go !!
How to extract non-missing value nan data with pandas
How to use xml.etree.ElementTree
How to use virtualenv
Scraping 2 How to scrape
How to use Seaboan
How to use image-match
How to use shogun
How to install Python
How to use Pandas 2
How to read PyPI
How to install pip
How to use Virtualenv
How to use numpy.vectorize
How to update easy_install
How to install archlinux
How to use pytest_report_header
How to restart gunicorn
How to install python
How to virtual host
How to debug selenium
How to use partial