Try tweeting arXiv's RSS feed on twitter from Raspberry Pi with python

Introduction

[arXiv] link-01 is a site operated by the Cornell University Library, where papers in various fields are submitted, and PDF viewing is possible for free.

I thought that if I analyze the information I want to see and post what I think on twitter, I can save the trouble of searching, but as the first step I decided to tweet the feed of [arXiv] link-01 on twitter.

The target this time was cs.CV, which is the category I am interested in.

I chose Raspberry Pi because it runs all the time, but if python works, there are no particular restrictions on the information device side.

How arXiv's RSS feed works

You can find out by reading the following two contents on the help page.

The important thing is that "update is once a day" written in arXiv API User's Manual 3.3.1.1. .. It is suggested that it is necessary to design considering the frequency of API calls and the cache mechanism because the information is not updated even if it is accessed frequently.

Because the arXiv submission process works on a 24 hour submission cycle, new articles are only available to the API on the midnight after the articles were processed. The tag thus reflects the midnight of the day that you are calling the API. This is very important - search results do not change until new articles are added. Therefore there is no need to call the API more than once in a day for the same query. Please cache your results. This primarily applies to production systems, and of course you are free to play around with the API while you are developing your program!

The xml of the feed can be obtained by replacing the category name described below.

http://export.arxiv.org/rss/cs.CV/rss.xml

A list of categories can be found here [https://arxiv.org/help/api/user-manual#subject_classifications).

python program

References

The program was created with reference to the following information.

Creation point

Library

Designed by importing the following library.

auth key information

First of all, according to the model, the auth key related information of twitter is summarized in ʻauth.py`.

auth.py


consumer_key        = 'ABCDEFGHIJKLKMNOPQRSTUVWXYZ'
consumer_secret     = '1234567890ABCDEFGHIJKLMNOPQRSTUVXYZ'
access_token        = 'ZYXWVUTSRQPONMLKJIHFEDCBA'
access_token_secret = '0987654321ZYXWVUTSRQPONMLKJIHFEDCBA'

Ingest feed

Enter the xml URL of the feed you want to import into RSS_URL, and leave the update log (date and time: updated) in the file specified by PUBDATE_LOG.

I wanted to check the file specified by PUBDATE_LOG in the program, but I haven't implemented it so much, so in advance

python


$ touch cs.CV.log

You need to create an empty file with. .. ..

** your LOG dir ** is the location of this program. If you want to set up automatic execution with cron, you need to describe it with an absolute path.

python


RSS_URL = "http://export.arxiv.org/rss/cs.CV/rss.xml"
PUBDATE_LOG = "/your LOG dir/cs.CV.log"

Save the feed contents in a dictionary format in news_dic and post the necessary information on twitter with twython. The contents of the arXiv feed at this time are described below in the comments of the program.

python


news_dic = feedparser.parse(RSS_URL)

"""
new_dic.* : 
updated_parsed
etag
encoding
version
updated
headers
entries
namespaces
bozo
href
status
feed

print(news_dic.updated_parsed)  
print(news_dic.etag          )  #time.struct_time(tm_year=2017, tm_mon=8, tm_mday=16, tm_hour=0, tm_min=30, tm_sec=0, tm_wday=2, tm_yday=228, tm_isdst=0)
print(news_dic.encoding      )  #us-ascii
print(news_dic.version       )  #rss10
print(news_dic.updated       )  #Wed, 16 Aug 2017 00:30:00 GMT
print(news_dic.headers       )  #{'Expires': 'Thu, 17 Aug 2017 00:00:00 GMT', 'Connection': 'close', 'ETag': '"Wed, 16 Aug 2017 00:30:00 GMT", "1502843400"', 'Server': 'Apache', 'Vary': 'Accept-Encoding,User-Agent', 'Content-Type': 'text/xml', 'Content-Length': '15724', 'Date': 'Wed, 16 Aug 2017 06:43:57 GMT', 'Last-Modified': 'Wed, 16 Aug 2017 00:30:00 GMT', 'Content-Encoding': 'gzip'}
print(news_dic.entries       )  #CONTENTS OF RSS FEED!!
print(news_dic.namespaces    )  #{'': 'http://purl.org/rss/1.0/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'content': 'http://purl.org/rss/1.0/modules/content/', 'sy': 'http://purl.org/rss/1.0/modules/syndication/', 'dc': 'http://purl.org/dc/elements/1.1/', 'admin': 'http://webns.net/mvcb/', 'taxo': 'http://purl.org/rss/1.0/modules/taxonomy/'}
print(news_dic.bozo          )  #0
print(news_dic.href          )  #http://export.arxiv.org/rss/cs.CV/rss.xml
print(news_dic.status        )  #200
print(news_dic.feed          )  
"""

Check if the feed information has been updated with PubID and lastPubID, and if not, exit the program. If it has been updated, overwrite the file pointed to by PUBDATE_LOG.

python


pubID = news_dic.updated

#  pubID 
with open(PUBDATE_LOG, "r") as rf:
    lastPubID = rf.readline().rstrip("\n")

# 
if (pubID == lastPubID):
    print("")
    sys.exit()
else:
    with open(PUBDATE_LOG, "w") as f:
        f.write(pubID + "\n")

Post to twitter

There are the following items in new_dic.entries.

The information to be posted is title and link in new_dic.entries. However, title may be long, so make sure to keep it within 140 characters and limit the number of characters so that URL links can be described.

python


for i in news_dic.entries:
    if len(i.title) > 100:
        message = i.title[0:100] + "......\n" + i.link
    else:
        message = i.title[0:109] + "\n" + i.link
    #print(len(message))
    #print(message)

    try:
        twitter.update_status(status=message)
    except TwythonError as e:
        print(e)

Creation result

The final program as a result of trial and error is as follows.

py:twitter_feed_arxiv_cs.CV.py


# coding: utf-8
from twython import Twython, TwythonError
import feedparser
import sys
 
from auth import (
    consumer_key,
    consumer_secret,
    access_token,
    access_token_secret
)

twitter = Twython(
    consumer_key,
    consumer_secret,
    access_token,
    access_token_secret
)

RSS_URL = "http://export.arxiv.org/rss/cs.CV/rss.xml"
PUBDATE_LOG = "/<your LOG dir>/cs.CV.log"
"""

touch cs.CV.log
cron
"""

news_dic = feedparser.parse(RSS_URL)

"""
new_dic.* : 
updated_parsed
etag
encoding
version
updated
headers
entries
namespaces
bozo
href
status
feed

print(news_dic.updated_parsed)  
print(news_dic.etag          )  #time.struct_time(tm_year=2017, tm_mon=8, tm_mday=16, tm_hour=0, tm_min=30, tm_sec=0, tm_wday=2, tm_yday=228, tm_isdst=0)
print(news_dic.encoding      )  #us-ascii
print(news_dic.version       )  #rss10
print(news_dic.updated       )  #Wed, 16 Aug 2017 00:30:00 GMT
print(news_dic.headers       )  #{'Expires': 'Thu, 17 Aug 2017 00:00:00 GMT', 'Connection': 'close', 'ETag': '"Wed, 16 Aug 2017 00:30:00 GMT", "1502843400"', 'Server': 'Apache', 'Vary': 'Accept-Encoding,User-Agent', 'Content-Type': 'text/xml', 'Content-Length': '15724', 'Date': 'Wed, 16 Aug 2017 06:43:57 GMT', 'Last-Modified': 'Wed, 16 Aug 2017 00:30:00 GMT', 'Content-Encoding': 'gzip'}
print(news_dic.entries       )  #CONTENTS OF RSS FEED!!
print(news_dic.namespaces    )  #{'': 'http://purl.org/rss/1.0/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'content': 'http://purl.org/rss/1.0/modules/content/', 'sy': 'http://purl.org/rss/1.0/modules/syndication/', 'dc': 'http://purl.org/dc/elements/1.1/', 'admin': 'http://webns.net/mvcb/', 'taxo': 'http://purl.org/rss/1.0/modules/taxonomy/'}
print(news_dic.bozo          )  #0
print(news_dic.href          )  #http://export.arxiv.org/rss/cs.CV/rss.xml
print(news_dic.status        )  #200
print(news_dic.feed          )  
"""

pubID = news_dic.updated

#  pubID 
with open(PUBDATE_LOG, "r") as rf:
    lastPubID = rf.readline().rstrip("\n")

# 
if (pubID == lastPubID):
    print("")
    sys.exit()
else:
    with open(PUBDATE_LOG, "w") as f:
        f.write(pubID + "\n")

for i in news_dic.entries:
    if len(i.title) > 100:
        message = i.title[0:100] + "......\n" + i.link
    else:
        message = i.title[0:109] + "\n" + i.link
    #print(len(message))
    #print(message)

    try:
        twitter.update_status(status=message)
    except TwythonError as e:
        print(e)

Tweet

I did the following and confirmed that it was posted to my twitter account.

python


$ python3 twitter_feed_arxiv_cs.CV.py

I created a log file and program for cs.RO in a separate file, but it was also successful.

Tweet automation

You can tweet once a day with cron. It seems to be updated to ** 00: 30:00 GMT **, so set it to go to the feed every day at 10:00 (JST).

python


$ crontab -e

Set to go to the feed every day at 10:00 when the editor starts. ** your LOG dir ** is the location of this program.

python


00 10 * * * python3 /your LOG dir/twitter_feed_arxiv_cs.CV.py >/dev/null 2>&1

at the end

First of all, it became possible to simply tweet, but it seems that there are more than 50 submissions every day in cs.CV and cs.RO, so in order to efficiently search for articles of interest, it is necessary to further narrow down the submissions. ..

It seems that it can be done by parsing the character strings of title and description. It may be an example of machine learning.

Recommended Posts

Try tweeting arXiv's RSS feed on twitter from Raspberry Pi with python
Try debugging Python on Raspberry Pi with Visual Studio.
Working with GPS on Raspberry Pi 3 Python
Connect to MySQL with Python on Raspberry Pi
Ubuntu 20.04 on raspberry pi 4 with OpenCV and use with python
Install PyCall on Raspberry PI and try using GPIO's library for Python from Ruby
Use vl53l0x with Raspberry Pi (python)
Try using ArUco on Raspberry Pi
Try L Chika with raspberry pi
Try moving 3 servos with Raspberry Pi
[Memo] Tweet on twitter with python
Control the motor with a motor driver using python on Raspberry Pi 3!
Get US stock price from Python with Web API with Raspberry Pi
[Note] Using 16x2-digit character LCD (1602A) from Python with Raspberry Pi
Collecting information from Twitter with Python (Twitter API)
Detect "brightness" using python on Raspberry Pi 3!
Adafruit Python BluefruitLE works on Raspberry Pi.
Tweet from python with Twitter Developer + Tweepy
Try fishing for smelt with Raspberry Pi
Programming normally with Node-RED programming on Raspberry Pi 3
Try Object detection with Raspberry Pi 4 + Coral
Run servomotor on Raspberry Pi 3 using python
Working with sensors on Mathematica on Raspberry Pi
Try calling Python from Ruby with thrift
Detect temperature using python on Raspberry Pi 3!
Use BME280 temperature / humidity / barometric pressure sensor from Python on Raspberry Pi 2
Detect analog signals with A / D converter using python on Raspberry Pi 3!
Use python on Raspberry Pi 3 to light the LED with switch control!
Collecting information from Twitter with Python (Environment construction)
Discord bot with python raspberry pi zero with [Notes]
Detect slide switches using python on Raspberry Pi 3!
twitter on python3
I tried L-Chika with Raspberry Pi 4 (Python edition)
Detect magnet switches using python on Raspberry Pi 3!
Enjoy electronic work with GPIO on Raspberry Pi
Power on / off your PC with raspberry pi
Try working with Mongo in Python on Mac
Get CPU information of Raspberry Pi with Python
Make DHT11 available on Raspberry Pi + python (memo)
Sound the buzzer using python on Raspberry Pi 3!
Play with your Ubuntu desktop on your Raspberry Pi 4
Using the 1-Wire Digital Temperature Sensor DS18B20 from Python on a Raspberry Pi
Try to use up the Raspberry Pi 2's 4-core CPU with Parallel Python
Build a Python development environment on Raspberry Pi
GPS tracking with Raspberry Pi 4B + BU-353S4 (Python)
Measure CPU temperature of Raspberry Pi with Python
Perform a Twitter search from Python and try to generate sentences with Markov chains.
Use kintone API SDK for Python on Raspberry Pi (easily store data in kintone from Raspberry Pi)
Record temperature and humidity with systemd on Raspberry Pi
From setting up Raspberry Pi to installing Python environment
Run LEDmatrix interactively with Raspberry Pi 3B + on Slackbot
Try to visualize the room with Raspberry Pi, part 1
Install selenium on Mac and try it with python
Collecting information from Twitter with Python (morphological analysis with MeCab)
Try using the temperature sensor (LM75B) on the Raspberry Pi.
Automatic follow on Twitter with python and selenium! (RPA)
Control brushless motors with GPIOs on Raspberry Pi Zero
Interact with Python on Android from PC via adb
Face detection from images taken with Raspberry Pi camera
Install pyenv on Raspberry Pi and version control Python
Output to "7-segment LED" using python on Raspberry Pi 3!