[PYTHON] How to collect Twitter data without programming

greeting

Hello, this is sunfish. As the first post, I tried a corona-related tweet analysis series. I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?) I'm tired of Python, so I analyzed the data with nehan (I want to go live even with Corona-Sequel) I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with Corona sickness-Part 1)

This time, I would like to conclude with ** How to collect Twitter data in the first place **. Of course, using Analysis Tool nehan.

Use Amazon S3 as storage for storage.

Apply for API usage

First, you have to apply for the Twitter API. There are many ways to do this if you search on Google, so I will omit it. It's just an application process, but it's a little troublesome to write in various ways and to speak English in the first place.

Hit the Twitter API

nehan has many connectors for capturing external data. スクリーンショット 2020-10-19 15.38.46.png Since it uses Cdata drivers, it can also import Web service data. スクリーンショット 2020-10-19 15.38.55.png If you select Twitter and enter the obtained API information, you can get tweet data with SQL query.

Accumulate the acquired tweet data

Add some effort to the acquired data and store it in Amazon S3. Add one column to add processing time so that you can see when the data was acquired. スクリーンショット 2020-10-19 16.20.37.png This is where the variable function comes in handy. The execution time and execution date are dynamically defined. スクリーンショット 2020-10-19 16.20.06.png And finally, if you export to S3, the accumulation is completed. I put a variable in the file name to export so that I can know the processing date. スクリーンショット 2020-10-19 16.20.49.png

Process and accumulate data every day

Since it is not possible to manually execute the above process every day, set the automatic update setting. The flow that updates tweet data and stores it in S3 is set to be automatically executed at 0:00 every day. スクリーンショット 2020-10-19 16.21.57.png Sometimes the Twitter API doesn't respond and I can't get the data, so it sometimes fails. .. .. スクリーンショット 2020-10-19 16.22.14.png

Acquire and analyze the accumulated data again

Collect daily data accumulated in Amazon S3 in a batch and import it into nehan. I have been analyzing the data captured in this way.

Summary

While it makes me want to collect external data and multiply it with my own data, it can be very troublesome to collect. With nehan, you can directly connect to analysis as well as collection. Of course no programming required. For analysts who are tired of collecting data and writing Python, why not live a comfortable analytical life with nehan?

Click here for an introduction to the analysis tool nehan (https://nehan.io/product/).