-[x] The required specifications (?) Have been decided for the time being.
If you have decided what to do and the suspension, it is your style to start ** by google. In short, "** 80% of what I wanted to do is someone doing it first **". Even in Qiita, Similar things ( I mean, there are people who are doing more advanced things) , and I have no choice but to use it as a reference, but unfortunately there are no cases where it is all right. As far as Stack Over Flow was investigated, there was no case where a program that could properly meet the required specifications was released. It would have been easier if there was ... Chi </ sub>
Even if you make something that meets the requirements within the range of ~~ script kiddie ~~ personal use, you will have to put together this area yourself.
There are roughly two types of data acquisition from Twitter, "** REST API ** used" and "** Streaming API ** used". ** REST ** is a way to send a message and have the result returned, while ** Streaming API ** sends a command once and then sends the result to it endlessly. It will be sent. (And I understand. Please check the literature for details. )
The reason for making it Public Steram in the first place is
It's a very lazy reason that the Stream API seems to be less troublesome than REST if you just bring in a suitable library and read it. You should just keep throwing what was sent into the DB.
There are also multiple Stream APIs, ** "All that flows to Twitter (contract required)" **, ** "Tweets on my timeline" **, ** "Random 1% of the total" * There are multiple *, ** "results narrowed down by keywords, locale, etc. for the whole" , but here the last " <a href="https://dev.twitter.com/streaming/reference" / post / statuses / filter "> Narrowed down by the specified search word **" is used. Public Stream seems to be a generic term for these, but I'm not sure about that. (I just looked it up, but is it okay to understand that the total number of tweets that can be obtained with Filfer is not 1%, but all?) </ Small>
Perhaps because it's a Rotor human, the Stream API maintains an HTTP connection, but it's unreliable, or ** it disconnects even if nothing happens. Rather, I am wary of being disconnected ** because there is no problem. Even if you look at the previous reference book (pocket reference), it is written in such a way that you need to assume reconnection because it will be disconnected if something happens. …… But, even if you look at the page of a well-known library or the example of implementing it by yourself, there is no page that describes reconnection after disconnection within the range that you can see. * ...... Can you do that with the API? If so, it's never been so easy ...
(Note for those who know: I'm writing in chronological order in the brain while coding and remembering, so please wait for a while)
I have a lot of things to think about, but I decided to consider a Twitter-connected library with a lot of Japanese materials from the perspective of referring to the wonderful code of my predecessors.
When I searched for a library from the same viewpoint on the database side,
An iron plate that tends to be apt? Well, if Shiroto had to do something about it, he would have to use the standard one ...
After deciding what to use, let's prepare a development environment and a test environment. I'm a Gatchigachi Win shop who came in from VB and came in C → VC ++ → C #. Of course, the environment that can be used for development is also Windows, so it goes without saying that it is ideal to be able to develop on Win until just before release = implementation. Or rather, if you don't have an IDE (Integrated Development Environment), you will die instantly. Even more so, if you ask me anything on Linux, I have no choice but to sit down.
Fortunately, since it is a script language, there is not much dependence on the environment, and nowadays the installation of libraries etc. is automated, so the trouble should be much less than in the past.
I think this is fine for infrastructure, but I want to use IDE, etc. ~~ I licked the world ~~ When I was googled in a familiar environment, Python Tools for Visual Studio is something like that. ** Moreover, you can easily debug on the spot by calling Win version Python such as Anaconda **. Because there is only this anymore
Decided to develop and test with the configuration. As a concern,
Around, but ... Well, in any case, the actual machine test is necessary at the end, and if you do not write it well, no problem will occur, so leave it for the time being. (If this is a job, I'm afraid I have to pack it all around ...)
The program I'm making this time carries the demon's rule that once you start running, you can keep running anything for 3 months and stop it. Therefore, we will implement it by focusing on the essential functions, and do something else by other means **.
Well, it looks like this. Let's start with the elements with the highest priority and gradually improve the degree of perfection.
With the vow of "** I have O'Reilly's Python tutorial " and " I google if I don't understand **", I started creating a demonstration program for the time being. From Creating a New Project in Visual Studio, specify "Python Application" and you're ready to write Python code in your familiar editor. This is convenient. After typing the tutorial code, execute it with [F5]. You can try the execution with the same procedure as the C language console application, so there is really no stress. I'm sorry I can't step.
Well, if you can't talk to Twitter first, it's out of the question, so let's start by making that area. You need to install Tweepy first. If you think that you can type the "pip" command from Anaconda Prompt, this can also be executed from Visual Studio.
It's as fun as NuGet.
Stream based on some pages found by google for the time being and tutorial of the original Tweepy Write a program to get tweets with API.
tweetCheck.py
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tweepy
#Prepare the variables required to execute the Twitter API by yourself.
CK = '' # Consumer Key
CS = '' # Consumer Secret
AT = '' # Access Token
AS = '' # Accesss Token Secert
class Listener(tweepy.StreamListener):
def on_status(self, status):
print(status.text.encode('shift_jis', 'ignore'))
return True
def on_error(self, status_code):
print('Error occurred: ' + str(status_code))
return True
#Main processing from here
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS) #Obtaining an access token
listener = Listener() #Instance of Listener class
stream = tweepy.Stream(auth, listener) #Reception starts from here.
#Select one and uncomment it.
#stream.filter(track=['#xxxxxx']) #Filter by specified search word
stream.sample() #1 from all tweets on Twitter%pick up
#stream.userstream() #User's own TL
…… Eh, 31 lines (blank lines, including comments)? What can I do with this? ?? Execute while thinking. I can't read it (because it's UTF-8), but I can receive it. Forcibly terminate with Ctrl + C.
It feels like it's moving quickly, but it's actually clogged in two places.
The former is okay if you fix it the first time. The latter is okay because it doesn't need to be displayed if it's always moving. If it's the first time Python can manage to this extent in a few days, it may be unexpectedly in time for the end of October.
Next time, I will flesh out this sauce. (Continue)
Recommended Posts