[PYTHON] I want to get the operation information of yahoo route

background

A certain Twitter bot was created, and train operation information was needed anyway. As a result of various thoughts, I wondered if I could pull in the operation information of ** only specific routes ** for free from somewhere. I found that scraping from the yahoo line was the best solution.

Api and sites that deliver the operation status

After researching various things, it is currently delivering route operation information

  1. Json of railway delay information (In json format, the routes with delays etc. are aggregated and returned)
  2. Ekispert api (It looks excellent! But there is a charge)
  3. Operation status of yahoo route (The most familiar one, but no api !!!)

After all, if you want to do it for free, json of railway delay information or [operation status of yahoo line](https://transit. yahoo.co.jp/traininfo/top). In the former case, delayed route information nationwide is returned. After that, the information is updated every 10 minutes ... Even 10 minutes are fatal in the morning rush hour.

The latter yahoo line, ** Actually, the contents are after the station ** .... lol (Unexpectedly unknown) There is also operation information for each route, and it is accurate and fast in real time! Then *** If you don't have an api, scrape it! *** ***

Scraping yahoo route operation information

The author, who has never done scraping "su", tries scraping. Apparently ** "Beautiful Soup" ** is used ... Let's have a look at the wonderful soup!

What is Beautiful Soup? ??

BeautifulSoup is a Python library that retrieves data from HTML and XML files. Apparently, you can ** search for HTML elements from the fetched HTML using a parser. I don't know the details, so let's try it right away!

let's! Scraping!

Click here for the site you want to scrape this time! Operation status of yahoo line Tokaido line First, let's take a look at the html of the page スクリーンショット 2019-12-14 11.46.02.png Looking at the HTML at the time of delay, there is a trouble class of dd tag

<div id="mdServiceStatus">
  <dl>
  <dt>
  <span class="icnAlertLarge">[!]</span>Train delay</dt>
  <dd class="trouble">
   <p>Due to the influence of the inspection inside the Utsunomiya line, the down line(For Atami)Some trains are delayed.<span>(Posted at 09:25 on December 14)</span></p>
 </dd>
 </dl>
</div><!--/#mdServiceStatus-->

On the other hand, if you look into the normal HTML, there is a normal class for the dd tag.

<div class="elmServiceStatus">
    <dl>
    <dt><span class="icnNormalLarge">[○]</span>Normal operation</dt>
    <dd class="normal">
        <p>Currently, there is no information regarding accidents and delays.</p>
    </dd>
    </dl>
</div>

In other words On delay: ** trouble class dd tag ** Normal time: ** dd tag of normal class **

** Based on this dd tag, it seems possible to determine whether it is a delay or normal operation! ** Let's do it right away!

Installation

First, put Beautiful Soup with pip

$ pip install beautifulsoup4 

code

import requests
from bs4 import BeautifulSoup

#URL of the operation status of the Tokaido Line
ToukaidouLine_URL = 'https://transit.yahoo.co.jp/traininfo/detail/27/0/'

#Get web pages using Requests
ToukaidouLine_Requests = requests.get(ToukaidouLine_URL)

#Analyze web pages with BeautifulSoup
ToukaidouLine_Soup = BeautifulSoup(ToukaidouLine_Requests.text, 'html.parser')

#.Find the dd tag of the trouble class with find
if ToukaidouLine_Soup.find('dd',class_='trouble'):
    message = 'Tokaido Line is delayed'
else:
    message = 'The Tokaido Line is in normal operation'

print(message)

Execution result

Tokaido Line is delayed

Impressions

Isn't it really easy? ?? Bringing information from a web page is a dream come true! Later story: When I showed it to yahoo people, it was a secret that I was told "It's a gray zone"

Recommended Posts

I want to get the operation information of yahoo route
I tried to get the location information of Odakyu Bus
Keras I want to get the output of any layer !!
I want to get the name of the function / method being executed
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to customize the appearance of zabbix
I want to get League of Legends data ①
I tried to get the movie information of TMDb API with Python
I want to grep the execution result of strace
I want to fully understand the basics of Bokeh
I tried to visualize the spacha information of VTuber
I want to increase the security of ssh connections
I want to get the path of the directory where the running file is stored.
I want to plot the location information of GTFS Realtime on Jupyter! (With balloon)
I want to use only the normalization process of SudachiPy
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
I want to know the legend of the IT technology world
I tried to get various information from the codeforces API
I want to manually assign the training parameters of the [Pytorch] model
I tried to get the index of the list using the enumerate function
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
I want to find the shortest route to travel through all points
LINEbot development, I want to check the operation in the local environment
I want to check the position of my face with OpenCV!
I want to know the population of each country in the world.
PhytoMine-I tried to get the genetic information of plants with Python
I want to pin Spyder to the taskbar
I want to output to the console coolly
I want to handle the rhyme part1
I want to handle the rhyme part3
I want to display the progress bar
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
I want to get information from fstab at the ssh connection destination and execute a command
I want to extract the tag information (title and artist) of a music file (flac, wav).
I tried to get the batting results of Hachinai using image processing
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
I don't want to admit it ... The dynamical representation of Neural Networks
(Python Selenium) I want to check the settings of the download destination of WebDriver
I measured 6 methods to get the index of the maximum value (minimum value) of the list
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I tried to get the authentication code of Qiita API with Python.
I want to sort a list in the order of other lists
I want to express my feelings with the lyrics of Mr. Children
I want to analyze the emotions of people who want to meet and tremble
I want to use the Qore SDK to predict the success of NBA players
I want to leave an arbitrary command in the command history of Shell
I tried to get the RSS of the top song of the iTunes store automatically
I want to stop the automatic deletion of the tmp area with RHEL7
Python: I want to measure the processing time of a function neatly
[For beginners] I want to get the index of an element that satisfies a certain conditional expression
[Python] Get the main topics of Yahoo News
Use the MediaWiki API to get Wiki information
I want to handle the rhyme part7 (BOW)
I want to store DB information in list
I tried to touch the API of ebay
I tried to correct the keystone of the image