[PYTHON] Experiment to collect tweets for a long time (immediately before execution)

Until last time

-[x] Program ready -[x] Required item implementation completed -[x] Implementation of additional items completed -[x] Infrastructure ready -[] Preparation for final execution -[] Is there a problem with the library? -[] Just in case it stops -[] Acquisition started!

Before the final execution preparation

There are a lot of unreasonable stories these days.

――Twitter's selling story has failed, and it's going in an unreasonable direction ――Tweepy that I started using because "everyone is using it" It was announced that maintenance was finished
(It looks like it was taken over after that)

The latter is uneasy, even if something goes wrong. ** I don't think Twitter can keep up until the end of the year **, but I wonder if it's better to think about it in the future, but I'm not good at moving the riverbank.

Dealing with library bugs

It was pointed out in "Program Preparation (2)" I have to fix it in the execution environment. I will fix the Stream bug of Tweepy that I thought.

# python
Python 3.5.2 (default, Sep 24 2016, 05:23:05)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tweepy
>>> tweepy.__file__
'/usr/local/lib/python3.5/site-packages/tweepy/__init__.py'
>>>

You now have the tweepy installation path. After that, fix the file (streaming.py) like this via the shared folder. Overwrite copy. The mission is complete.

Bug fix, again.

** However, the story that this response is not enough ** jumped in just before the event. I happened to notice that an exception error occurred when I was throwing it in the bug check after the above correction, so I checked the issue on github and Discovered.

Make a correction like this and copy again. This time the mission is complete.

Dealing with the possibility of still falling

The above exception error was trapped inside the program and immediately reconnected to Twitter, but given that I don't know what else would happen, ** the program crashes without warning **. I have to think about how to deal with. Fortunately, I will use search and find this . I have no choice but to express my gratitude for the prostration. After execution, if you kill the process, it will be restarted automatically. What a convenience.

Well, I think that even if you do so far, it will fall when it falls, but even if you do not say that it is tenfold or twentyfold, "Easy initial action with error notification" "Ignore exceptions and automatically reconnect" If there are three "restart with process monitoring", three months will be manageable. ** Unless you inadvertently update your NAS or router settings. ** **

Start execution!

For the time being, everything is ready. Perhaps. The work ignited on the way and I was delayed, but I was able to do some tests just before that, so let's say it's okay.

That's why, two days before the winning announcement (code word) </ sub>, we dropped the test data on October 27 and started operation. For the time being, I was able to get it without any problems even during the rush hour of the winning announcement (code word) </ sub>. Success ... like? For the time being, let's take a look at the situation for about a week.

Schedule after this

For the time being, I would like to be able to count the number of tweets based on time. If possible, I would like to prevent the execution environment from being adversely affected, but what should I do?

(Still continuing.)

Recommended Posts

Experiment to collect tweets for a long time (immediately before execution)
Experiment to collect tweets for a long time (Program preparation (3))
Experiment to collect tweets for a long time (Program preparation (1))
Experiment to collect tweets for a long time (program preparation (2))
Experiment to collect tweets for a long time (Program preparation (5))
Experiment to collect tweets for a long period of time (aggregation & content confirmation)
A study method for beginners to learn time series analysis
I want to create a Dockerfile for the time being.
I want to record the execution time and keep a log.
[Python3] Define a decorator to measure the execution time of a function
Python (from first time to execution)