I tried to automate internal operations with Docker, Python and Twitter API + bonus

The person in charge of the campaign linked with Twitter in the in-house project I have a problem that I have to collect tweets manually, I wanted to do something about it, so I used Docker and Python. I created a tweet collection tool.

Problems faced when making

--I had a free Twitter Developer account, but there are many restrictions on the tweets I can get,
However, you cannot make a high-priced premium contract just for automation. --So, using the following third party library "GetOldTweet" Create a program that can get tweets of any number and time without API https://github.com/Jefferson-Henrique/GetOldTweets-python => From one day, the program started throwing errors, and when I checked it, Issue was mentioned.
It seems unlikely that the bug will be fixed for a while.

――When I was in trouble, I found the following repository where an Indian engineer commented on Issue
https://github.com/itsayushisaxena/Get_Old_Tweets-Python Apparently, if you look at the source code, you need a Twitter Standard API account, By combining tweepy and snscrape, it seems that you can get the range and number of tweets you want as before.

What I made

--Building an environment where Python 3 works with Docker --Scraping using Twitter API in Python --A shell script that automates so that the person in charge can use it from the terminal without being aware of the docker command.

Click here for source code https://github.com/hikkymouse1007/GetTweets_pub

This time, I created a mechanism that can be operated on the PC of the person in charge of the project and can eliminate difficult operations.

So, I tried to create a series of flow that executes container startup, tweet acquisition, CSV creation
by Docker and shell script.

The directory structure is as follows.

.
├── Dockerfile
├── Makefile
├── README.md
├── command
│   └── twitter //Shell script
├── docker-compose.yml
└── src
    ├── csv_files //Output CSV here
    └── got_v2.py //Python source code

Dockerfile, docker-compose I referred to the following article for the recipe of the container that works with python3. https://qiita.com/reflet/items/4b3f91661a54ec70a7dc Since tweepy does not support 3.9, I specified the version of python3.8 this time.

Install the operating environment of python and the required libraries.

# Dockerfile
FROM python:3.8
USER root

RUN apt-get update
RUN apt-get -y install locales && \
    localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
RUN apt-get -y install sudo
RUN sudo apt-get update && apt-get install -y cowsay fortunes
ENV PATH $PATH:/usr/games
RUN echo $PATH

ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm

RUN apt-get install -y vim less
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install requests requests_oauthlib
RUN pip install pandas
RUN pip install IPython
RUN pip install twitter
RUN pip install tweepy
RUN pip install snscrape
# docker-compose.yml
version: '3'
services:
  python3:
    restart: always
    build: .
    container_name: 'python3'
    working_dir: '/root/'
    tty: true
    volumes:
      - ./src:/root/src

got.py Source code for accessing the TwtterAPI and retrieving tweets. I borrowed the basic source code from this repository. https://github.com/itsayushisaxena/Get_Old_Tweets-Python

Please enter the following information that you will receive when you issue an account for twitterStandardAPI.

Constant name Type of key to enter
TWITTER_CLIENT_KEY API key
TWITTER_CLIENT_SECRET API secret key
TWITTER_CLIENT_ID_ACCESS_TOKEN Access token
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET Secret token

As a simple flow, pass the
environment variable in the docker container from the shell script described later, read the information such as hashtag from the environment variable with python, and use
tweepy and snscrape to get the tweet. Performs processing such as outputting the acquired tweets to a CSV file.

import tweepy
import csv
import os
import snscrape.modules.twitter as sntwitter
import sys
sys.dont_write_bytecode = True

#ENV_VALUES
tag = os.environ["TAG"]
since_date = os.environ["FROM"]
until_date =  os.environ["UNTIL"]
tweet_count = os.environ["NUM"]

#Provide your own credentials here.
TWITTER_CLIENT_KEY = '####################'
TWITTER_CLIENT_SECRET = '########################'
TWITTER_CLIENT_ID_ACCESS_TOKEN = '####################################'
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET = '################################'

auth = tweepy.OAuthHandler(TWITTER_CLIENT_KEY, TWITTER_CLIENT_SECRET)
auth.set_access_token(TWITTER_CLIENT_ID_ACCESS_TOKEN, TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET)
api = tweepy.API(auth,wait_on_rate_limit=True)

#pip install snscrape
csvFile = open('/root/src/csv_files/%s_from_%s_to_%s_%s_tweets.csv' %(tag, since_date, until_date, tweet_count), 'a')
csvWriter = csv.writer(csvFile)
maxTweets = int(tweet_count)  # the number of tweets you require
print('%s since:%s until:%s' % (tag, since_date, until_date))
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('%s' % tag +'since:%s until:%s' % (since_date, until_date)).get_items()) :
        if i > maxTweets :
            break
        csvWriter.writerow([tweet.date, tweet.username, tweet.content]) #If you need more information, just provide the attributes

Shell script

Here, by passing the path of the command named twiiter, it matches the condition you want to search. Let's execute all the processing to get the tweet. The parameters entered here as standard are passed as environment variables in the Docker container.

command/twitter

#!/bin/sh
echo "Enter the following data and press enter"
read -p "hashtag(eg. #test): " str1
read -p "Data acquisition start date(eg. 2020-08-10): " str2
read -p "Data acquisition end date(eg. 2020-08-20): " str3
read -p "Number of tweets acquired(eg. 100): " str4
TAG=$str1 FROM=$str2 UNTIL=$str3 NUM=$str4
echo "Entered data"
echo $TAG $FROM $UNTIL $NUM

ANIMALS=("cheese" \
         "cock" \
         "dragon-and-cow" \
        "ghostbusters" \
        "pony" \
        "stegosaurus" \
        "turtle" \
        "turkey" \
        "gnu"\
        )
ANIMAL=${ANIMALS[$(($RANDOM % ${#ANIMALS[*]}))]}

docker-compose -f ~/path/to/docker-compose.yml \
    run \
    --rm \
    -e TAG=$TAG \
    -e FROM=$FROM \
    -e UNTIL=$UNTIL \
    -e NUM=$NUM \
    -e ANIMAL=$ANIMAL \
    python3 \
    /bin/bash -c "python /root/src/got_v2.py && cowsay -f $ANIMAL “I collected tweets”"

FILENAME="${TAG}_from_${FROM}_to_${UNTIL}_${NUM}_tweets.csv"
echo $FILENAME
mkdir -p ~/Desktop/twitter_csv_files
cp src/csv_files/$FILENAME ~/Desktop/twitter_csv_files
open ~/Desktop/twitter_csv_files/$FILENAME

Makefile

Create a directory and create a twitter command with make path in one shot. Execute the make command in the root directory of this repository. This time, create a directory called commad directly under the user directory, Place the script file for the twitter command there. You can delete the path with make rm-path.

docker-path:
	@echo $(PWD)
path:
	@mkdir ~/command
	@cp ./command/twitter ~/command/twitter
	@ln -si ~/command/twitter /usr/local/bin
	@chmod 777 ~/command/twitter
rm-path:
	@rm -rf ~/command
	@rm /usr/local/bin/twitter 

I actually moved it

Below is a video of running the program created this time.

output1

bonus

There is a mysterious animal in the previous video, but this is a program called cowsay It is installed in the Docker image created this time. Cute animals randomly complete CSV file creation so that workers do not get tired of monotonous work I tried to tell you. Randomly pass the animal name written in the shell script as an environment variable when starting docker, The cowsay command is executed at the end of the script.

Example

スクリーンショット 2020-12-15 22 20 56 スクリーンショット 2020-12-15 22 21 04

There are many other animals besides the ones listed here, so if you are interested, please check them out and add them.

Reference article

--Shell script https://qiita.com/Lambda34/items/7d24ebe6f7bde5bedddc

Recommended Posts

I tried to automate internal operations with Docker, Python and Twitter API + bonus
I tried follow management with Twitter API and Python (easy)
I tried to automate sushi making with python
I tried to make GUI tic-tac-toe with Python and Tkinter
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried using Twitter api and Line api
I tried to make a periodical process with Selenium and Python
I tried to easily detect facial landmarks with python and dlib
I tried to delete bad tweets regularly with AWS Lambda + Twitter API
I tried to get the authentication code of Qiita API with Python.
I tried to get the movie information of TMDb API with Python
Automate keyboard and mouse operations with python to streamline daily work [RPA]
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried function synthesis and curry with python
I tried to output LLVM IR with Python
Mayungo's Python Learning Episode 5: I tried to do four arithmetic operations with numbers
[ES Lab] I tried to develop a WEB application with Python and Flask ②
Three things I was addicted to when using Python and MySQL with Docker
I tried to make a simple image recognition API with Fast API and Tensorflow
I tried to read and save automatically with VOICEROID2 2
I want to handle optimization with python and cplex
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to implement an artificial perceptron with python
Crawling with Python and Twitter API 1-Simple search function
I tried to uncover our darkness with Chatwork API
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to implement Grad-CAM with keras and tensorflow
I tried hitting the API with echonest's python client
I tried to summarize the string operations of Python
I tried to solve AOJ's number theory with Python
I tried fp-growth with python
I tried scraping with Python
Use Twitter API with Python
It's too easy to access the Twitter API with rauth and I have her ...
I tried gRPC with Python
I tried scraping with python
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to predict and submit Titanic survivors with Kaggle
Easy to use Nifty Cloud API with botocore and python
Try hitting the Twitter API quickly and easily with Python
I tried to create API list.csv in Python from swagger.yaml
I tried to make various "dummy data" with Python faker
I tried various methods to send Japanese mail with Python
I tried connecting Raspberry Pi and conect + with Web API
[Python] I tried to visualize tweets about Corona with WordCloud
[Python] I tried to visualize the follow relationship of Twitter
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to enumerate the differences between java and python
I tried to divide the file into folders with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I made a server with Python socket and ssl and tried to access it from a browser
I also tried to imitate the function monad and State monad with a generator in Python