[Basics of data science] Collecting data from RSS with python

Said thing

-[Python beginners tried data science] Data acquisition from API [day1]

For the time being, the series continues. I plan to continue writing the acquired data analysis in the future. (plans)

This time, we will get data from Yahoo! News via RSS and output news headlines and links.

$ python yahoo_news_rss.py

so,

Yahoo!News topics-Top
http://dailynews.yahoo.co.jp/fc/world/nuclear_weapons/?id=6160968
NPT broke document could not be adopted and closed
http://dailynews.yahoo.co.jp/fc/domestic/injury_case/?id=6160975
Family cut off, 4 dead and injured man arrested
http://dailynews.yahoo.co.jp/fc/economy/coffee_chain/?id=6160962
Over 1,000 people at Starbucks Tottori Open
http://dailynews.yahoo.co.jp/fc/economy/apparel/?id=6160961
Burberry is doing well, ending the contract in June
http://dailynews.yahoo.co.jp/fc/science/life_on_earth/?id=6160969
Is the concept of "zero" recognized by monkeys?
http://dailynews.yahoo.co.jp/fc/sports/prowrestling/?id=6160973
First Tiger Mask Emergency Surgery
http://dailynews.yahoo.co.jp/fc/entertainment/broad_casting/?id=6160945
Repulsion to live broadcast Yuko Ando confused
http://dailynews.yahoo.co.jp/fc/domestic/obituary/?id=6160936
Maruyama, an idol of fighting illness, dies

I will explain how to use RSS as simple as it appears.

What is RSS?

RSS is a general term for several document formats for easily compiling and distributing updated information on various websites such as news and blogs.

From wiki. In other words, it feels like it's a hassle to scrape the real thing, but it's easy to put together for information acquisition.

Align modules.

Install feedparser

Use feed parser to analyze RSS feeds.

So, install feed parser [this](http: // otiai10.hatenablog.com/entry/2012/05/04/180950)が参考になった。

How feedparser works

As you can see by referring to Documentation, let's hold down the following two keys.

entries

A list of dictionaries. Each dictionary contains data from a different entry. Entries are listed in the order in which they appear in the original feed.

feed

A dictionary of data about the feed.

For each internal key, refer to Documentation for a list.

Get and display RSS

yahoo provides RSS via [like this](http://headlines.yahoo.co.jp/rss/list http://headlines.yahoo.co.jp/rss/list). So, the following is to parse the XML returned by hitting this URL into the dictionary.

yahoo_news_rss.py


import feedparser

RSS_URL = "http://rss.dailynews.yahoo.co.jp/fc/rss.xml"

yahoo_news_dic = feedparser.parse(RSS_URL)

The news is now in the dictionary. Look at the feedparser documentation to see what the structure is.

So below is where the entire title and the title and link of each article are displayed.

yahoo_news_rss.py


print yahoo_news_dic.feed.title

for entry in yahoo_news_dic.entries:
  title = entry.title
  link  = entry.link
  print link
  print title

And as it was at the beginning

Yahoo!News topics-Top
http://dailynews.yahoo.co.jp/fc/world/nuclear_weapons/?id=6160968
NPT broke document could not be adopted and closed
http://dailynews.yahoo.co.jp/fc/domestic/injury_case/?id=6160975
Family cut off, 4 dead and injured man arrested
http://dailynews.yahoo.co.jp/fc/economy/coffee_chain/?id=6160962
Over 1,000 people at Starbucks Tottori Open
http://dailynews.yahoo.co.jp/fc/economy/apparel/?id=6160961
Burberry is doing well, ending the contract in June
http://dailynews.yahoo.co.jp/fc/science/life_on_earth/?id=6160969
Is the concept of "zero" recognized by monkeys?
http://dailynews.yahoo.co.jp/fc/sports/prowrestling/?id=6160973
First Tiger Mask Emergency Surgery
http://dailynews.yahoo.co.jp/fc/entertainment/broad_casting/?id=6160945
Repulsion to live broadcast Yuko Ando confused
http://dailynews.yahoo.co.jp/fc/domestic/obituary/?id=6160936
Maruyama, an idol of fighting illness, dies

It comes out with a feeling.

As mentioned above, the source code is also here.

Recommended Posts

[Basics of data science] Collecting data from RSS with python
[Data science basics] I tried saving from csv to mysql with python
Getting Started with Python Basics of Python
Collecting information from Twitter with Python (Twitter API)
Basics of binarized image processing with Python
Receive textual data from mysql with python
[Note] Get data from PostgreSQL with Python
Basics of Python ①
Recommendation of Altair! Data visualization with Python
[Introduction to Data Scientists] Basics of Python ♬
Collecting information from Twitter with Python (Environment construction)
[Data science memorandum] Handling of missing values ​​[python]
Extract data from a web page with Python
Data analysis with python 2
Collecting tweets with Python
Basics of Python scraping basics
# 4 [python] Basics of functions
Basics of python: Output
Data analysis with Python
Scraping RSS with Python
Learn Nim with Python (from the beginning of the year).
How to scrape image data from flickr with python
Collecting information from Twitter with Python (morphological analysis with MeCab)
Challenge principal component analysis of text data with Python
[Basics of Modern Mathematical Statistics with python] Chapter 1: Probability
Get data from database via ODBC with Python (Access)
Extract the band information of raster data with python
Sample data created with python
Get Youtube data with python
python: Basics of using scikit-learn ①
With skype, notify with skype from python!
Basics of Python × GIS (Part 1)
Read json data with python
Try scraping the data of COVID-19 in Tokyo with Python
Extract template of EML file saved from Thunderbird with python3.7
Get rid of dirty data with Python and regular expressions
[Python] Extract text data from XML data of 10GB or more.
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
Xpath summary when extracting data from websites with Python Scrapy
Get data from analytics API with Google API Client for python
Implement normalization of Python training data preprocessing with scikit-learn [fit_transform]
[Python] I tried collecting data using the API of wikipedia
Collecting information from Twitter with Python (MySQL and Python work together)
What beginners learned from the basics of variables in python
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
[Data science memorandum] Confirmation of the contents of DataFrame type [python]
Ported from R language of "Sazae-san's rock-paper-scissors data analysis" to Python
How to extract features of time series data with PySpark Basics
Paiza Python Primer 5: Basics of Dictionaries
Call C from Python with DragonFFI
Using Rstan from Python with PypeR
Get data from Quandl in Python
Trial of voice recognition using Azure with Python (input from microphone)
Install Python from source with Ansible
Create folders from '01' to '12' with python
Practical exercise of data analysis with Python ~ 2016 New Coder Survey Edition ~
Existence from the viewpoint of Python
[Python] Get economic data with DataReader
How to avoid duplication of data when inputting from Python to SQLite.
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]