[PYTHON] One-liner web scraping by tse

Introduction

It is the 23rd day of Crawler / Web Scraping Advent Calendar 2015. It also serves as Adventer's Perfume Advent Calendar 2015 m (_ _) m

I would like to introduce one-liner web scraping using the text formatting tool "tse" by Python.

What is tse?

For more details, I think you can understand by looking at the slides presented by Mr. Ishimoto at this year's PyCon JP 2015. tse --Text Formatting Utility with Python

I wrote an introductory article about tse the other day, so please refer to that as well. tse --Introduction to Text Stream Editor in Python

Get the latest news from Perfume

Let's take a quick look at the latest news from Natalie as a way to organize our minds so that we can enjoy Perfume's performance in the Kouhaku Uta Gassen next week.

Actually try

curl http://natalie.mu/music/news/list/artist_id/217 | tse -ms 'bs4' -b 'for dt in BeautifulSoup(sys.stdin.read(), "lxml").find_all("dt", class_="NA_title", limit=30):{{print(dt.string)}}'

output


V6 in red and white&Perfume co-stars with Mickey and Arashi tags with "Star Wars"
Arashi, AKB, Babymetal, X JAPAN and others "M Ste Super Live" song announcement
"Kohaku Uta Gassen" Participating Singer Song Announcement, Sachiko Kobayashi Shows "Senbonzakura"
CDTV New Year's special program aiko, Areki, KANA-BOON、T.M.R., Perfume, Gen Hoshino, etc.
Perfume lifts teaser ban on Anniversary Budokan live video collection
Screening of Perfume documentary, BiS Cano and others at music & subculture movie events
48 & 46, Stardust, Hello! Project mixed medley! Live-specialized "FNS Kayosai" 16th OA
Enon Kawatani's birthday is celebrated by gorgeous people, indigo la End's biggest one-man success in history
42 gorgeous groups such as Mr. Children, L'Arc, YUKI, Babymetal, Arashi, X JAPAN at the end of the year SP of "Music Station"
Perfume x Competitive Karuta, New Song Lifted in Trailer of Movie "Chihayafuru"
"Fake three daughters" have also come! Perfume, pornography, Ryunosuke Kamiki and other gorgeous co-stars in the 23rd year "AAA"
Original author & Suzu Hirose inspiring, Perfume's theme song for the movie "Chihayafuru"
There are tears, laughter, and impersonation! The 66th Red and White Contestant Conference of Delight
BUMP, Nogizaka, Gen Hoshino, Guess, μ in red and white's et al. 10 groups first appearance! X JAPAN is the first in 18 years
Perfume becomes the character of Mercedes-Benz, designed by Yoshiyuki Sadamoto
Perfume Budokan live is visualized, 10DAYS digest & sugoroku all performances in the first edition
Perfume, KANA for chat project "Konason Festival"-BOON, Theater Brook
29 groups including aiko, Ikimono, Kyary, Sekaowa, etc.
Arashi, Perfume, Sandaime J Soul Brothers, Momokuro, Gen Hoshino, etc. at "FNS Kayosai"
"MTV VMAJ" winners include Gen Hoshino, Sandaime JSB, Amuro and others
Next week at "Music Station", Ikimono, NMB, Sakurako Ohara, Kanjani, bknb, Perfume
"Perfume amazing comedian" again! Okite Porsche, Asagaya Shimai sister Watanabe on stage
Perfume, 5 albums from the Tokuma era are converted to analog
Shrimp in the next "Kai", TOKIO, Perfume, KANA-BOON, Good Morning America
Perfume, Minister of Internal Affairs and Communications Award/Winner of the ACC Grand Prix "I'm glad that the team is evaluated"
Perfume's Yahoo!Search Kisekae "STAR TRAIN" Ver.Appearance
Perfume, Ikimonogakari, and TOKIO will appear on the next "CDTV"
"Big fan for several years" Ken Shimura nominated Perfume as a conversation partner, on air on E-tele
"Three people together all the time" Perfume reveals future dreams at a film festival
Perfume, Paruru, ELT, Mamoru Miyano, GACKT and others appear on the red carpet at the Tokyo International Film Festival

Featured article

"V6 & Perfume co-stars with Mickey and others in red and white, Arashi tags with" Star Wars ""

This news is terrible! Collaboration between Perfume and Disney! !! 2013/08/14 The excitement of NHK General TV "The magical melody that connects everyone" Disney & Ghibli's masterpiece "" is back! !! !! https://www.youtube.com/watch?v=qdPDeG_XxaU

Commentary

tse -ms-import with <module name import * format> -b --Action to take immediately after startup {{}} --Specify the block to indent

Thanks to tse's -ms option, we were able to run BeautifulSoup in one liner!

in conclusion

So far, we have introduced the technique of one-liner web scraping using tse.

tse is very convenient because you can apply the description in Python that you are accustomed to to one liner! Please use it in scenes other than Perfume.

By the way, Crawler / Web Scraping Advent Calendar 2015 is just around the corner! Let's be careful not to be exposed as an Advent Calendar non-poster by @ jz5 on the last day.

Recommended Posts

One-liner web scraping by tse
Image collection by web scraping
web scraping
web scraping (prototype)
Introduction to Web Scraping
Python web scraping selenium
Web scraping with python + JupyterLab
EXE Web API by Python
Save images with web scraping
Web scraping technology and concerns
Trade-offs in web scraping & crawling
Easy web scraping with Scrapy
Web scraping using Selenium (Python)
Web scraping using AWS lambda
Web scraping beginner with python
Collect only facial images of a specific person by web scraping
Get Splunk download link by scraping
Web scraping with Python ① (Scraping prior knowledge)
Web scraping with BeautifulSoup4 (layered page)
Scraping Alexa's web rank with pyQuery
Web scraping with Python First step
I tried web scraping with python.
GAN: DCGAN Part1 --Scraping Web images
A collection of one-liner web servers
Beginners use Python for web scraping (1)
Web scraping for weather warning notifications.
Beginners use Python for web scraping (4) ―― 1
I tried collecting one-liner web servers
10 questions to check before web scraping