[PYTHON] How to get a list of links from a page from wikipedia

[Web scraping with Python](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8BWeb%E3%82%B9%E3%82%AF % E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0-Ryan-Mitchell / dp / 4873117615). In it, get the link contained in the article from the Wikipedia page. The sample in this book seems to be an English page, so I improved it a little for Japanese Wikipedia.

Execution environment

OS:OX X EI Capitan(10.11.5) Python:3.5.1

#codeing:utf-8

import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.parse import unquote

url = "https://ja.wikipedia.org/wiki/%E3%83%86%E3%82%A4%E3%83%AB%E3%82%BA_%E3%82%AA%E3%83%96_%E3%82%A4%E3%83%8E%E3%82%BB%E3%83%B3%E3%82%B9"

html = urlopen(url)
bsObj = BeautifulSoup(html,'html.parser')

pattern = re.compile("^(/wiki/)((?!:).)*$")

for link in bsObj.find('div',{'id':'bodyContent'}).findAll('a',href = pattern):
    if 'href' in link.attrs:
        print (unquote(link.attrs['href']))     

Recommended Posts

How to get a list of links from a page from wikipedia
How to get a list of built-in exceptions in python
How to write a list / dictionary type of Python3
How to get a list of files in the same directory with python
[Python] How to make a list of character strings character by character
How to shuffle a part of a Python list (at random.shuffle)
[Command] Command to get a list of files containing double-byte characters
Extract a page from a Wikipedia dump
How to get the last (last) value in a list in Python
How to access wikipedia from python
How to get a quadratic array of squares in a spiral!
How to connect the contents of a list into a string
[Python] How to create a table from list (basic operation of table creation / change of matrix name)
Try to get a list of breaking news threads in Python.
How to get a string from a command line argument in python
[Python] How to get & change rows / columns / values from a table.
Here's a brief summary of how to get started with Django
How to use Visual Recognition to get LINE ID from a girl
I tried to get a list of AMI Names using Boto3
How to get the vertex coordinates of a feature in ArcPy
How to get a list excluding elements whose index is i ...?
How to get a job as an engineer from your 30s
How to remove duplicates from a Python list while preserving order.
How to create a clone from Github
How to get rid of long comprehensions
[Python] How to convert a 2D list to a 1D list
How to get a stacktrace in python
[python] Get a list of instance variables
How to create a repository from media
Summary of how to use Python list
How to test on a Django-authenticated page
[Python] Get a list of folders only
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[Linux] Command to get a list of commands executed in the past
How to get a value from a parameter store in lambda (using python)
How to get a namespaced view name from a URL (path_info) in Django
How to get a sample report from a hash value using VirusTotal's API
How to format a list of dictionaries (or instances) well in Python
How to calculate the volatility of a brand
A simple example of how to use ArgumentParser
How to open a web browser from python
How to clear tuples in a list (Python)
How to create a function object from a string
How to get results from id in Celery
How to generate a Python object from JSON
Get a list of IAM users with Boto3
How to get dictionary type elements of Python 2.7
Get a list of Qiita likes by scraping
How to pass the execution result of a shell command in a list in Python
How to achieve something like a list of void * (or variant) in Go?
[NNabla] How to get the output (variable) of the middle layer of a pre-built network
Python script to get a list of input examples for the AtCoder contest
[Introduction to Python] How to get the index of data with a for statement
How to use list []
[Python] How to remove duplicate values from the list
How to get the number of digits in Python
Convert a slice object to a list of index numbers
A memo of how to use AIST supercomputer ABCI
How to get a logged-in user with Django's forms.py
Python: Get a list of methods for an object
Basics of PyTorch (2) -How to make a neural network-