[PYTHON] [Memo] How to use BeautifulSoup4 (2) Display the article headline with Requests

The html of websites on the Internet contains various information, and it is difficult to analyze it by yourself. Therefore, we use a library called Requests that gets html.

This time, we will learn how to use Requests by acquiring the headlines of articles in the domestic column of MSN Japan.

In [1] Import Beautiful Soup, Requests and Re

In[1]


from bs4 import BeautifulSoup
import requests
import Re

In [2] Store html information in variable urlshutoku

In[2]


urlshutoku = requests.get("https://www.msn.com/ja-jp")

In [3] Try to display the entire page

In[3]


urlshutoku.text

When In [3] is displayed, unnecessary information is more noticeable, so only the headings that are necessary information this time are displayed. For that purpose, the headline information must be obtained. That's where Google Chrome's developer tools come in.

First, right-click the heading and click Validate (I). Then, the following screen is displayed. 2020-10-03_220938.png

The information used for scraping is only alphanumeric characters on the left side of the above screen. Make sure that the heading at the top of the part where you clicked Verify earlier is blue. Next, check \ corresponding to the url of the article headline. Other headlines are the same, so \ seems to be a clue.

In [4] Analyzed with BeautifulSoup and html.parser

In[4]


soup = BeautifulSoup(urlshutoku.text,"html.parser")

Extract domestic headlines using In [5] find_all

In[5]


midashi = soup.find_all(href=re.compile("/ja-jp/news/national"))

If you type midashi on the jupyter notebook, the headline information will be displayed, but the url information is also included. Since it is difficult to see as it is, only characters can be displayed.

Display only characters using In [6] for statement and string

In[6]


for ichiran in midashi:
    print(ichiran.string)

Now only the heading is displayed.

Recommended Posts

[Memo] How to use BeautifulSoup4 (2) Display the article headline with Requests
[Memo] How to use BeautifulSoup4 (3) Display the article headline with class_
[Memo] How to use BeautifulSoup4 (1) Display html
How to use the generator
How to use the decorator
How to use cron (personal memo)
Python: How to use async with
How to use the zip function
How to use the optparse module
How to use Requests (Python Library)
How to use virtualenv with PowerShell
[Memo] How to use Google MµG
How to use FTP with Python
How to use the ConfigParser module
[Python] Explains how to use the format function with an example
How to use the Spark ML pipeline
How to use ManyToManyField with Django's Admin
How to use OpenVPN with Ubuntu 18.04.3 LTS
How to use Cmder with PyCharm (Windows)
[Linux] How to use the echo command
How to use the Linux grep command
How to use Ass / Alembic with HtoA
How to use Japanese with NLTK plot
How to display python Japanese with lolipop
How to use jupyter notebook with ABCI
How to use CUT command (with sample)
How to use the IPython debugger (ipdb)
How to use SQLAlchemy / Connect with aiomysql
How to use JDBC driver with Redash
[Python] Explains how to use the range function with a concrete example
How to use the C library in Python
How to use GCP trace with open Telemetry
How to use MkDocs for the first time
How to specify the NIC to scan with amazon-dash
[Python] How to change the date format (display format)
Specify the Python executable to use with virtualenv
How to try the friends-of-friends algorithm with pyfof
How to use the graph drawing library Bokeh
How to scrape horse racing data with BeautifulSoup
How to use the Google Cloud Translation API
How to use the NHK program guide API
The easiest way to use OpenCV with python
[Algorithm x Python] How to use the list
How to use tkinter with python in pyenv
How to display images continuously with matplotlib Note
How to Learn Kaldi with the JUST Corpus
How to display in the entire window when setting the background image with tkinter
How to use xml.etree.ElementTree
How to create an article from the command line
How to use virtualenv
How to use Seaboan
How to use image-match
How to delete the specified string with the sed command! !! !!
How to use shogun
A memo of how to use AIST supercomputer ABCI
How to use Pandas 2
How to use Virtualenv
[Introduction to Python] How to iterate with the range function?
How to use numpy.vectorize
How to create a submenu with the [Blender] plugin
How to use pytest_report_header