[PYTHON] You will be an engineer in 100 days ――Day 73 ――Programming ――About scraping 4

Click here until yesterday

You will become an engineer in 100 days-Day 70-Programming-About scraping

You will become an engineer in 100 days --Day 66 --Programming --About natural language processing

You will become an engineer in 100 days --Day 63 --Programming --Probability 1

You will become an engineer in 100 days-Day 59-Programming-Algorithms

You will become an engineer in 100 days --- Day 53 --Git --About Git

You will become an engineer in 100 days --Day 42 --Cloud --About cloud services

You will become an engineer in 100 days --Day 36 --Database --About the database

You will be an engineer in 100 days-Day 24-Python-Basics of Python language 1

You will become an engineer in 100 days --Day 18 --Javascript --JavaScript basics 1

You will become an engineer in 100 days --Day 14 --CSS --CSS Basics 1

You will become an engineer in 100 days --Day 6 --HTML --HTML basics 1

This time is also a continuation of scraping.

Up to the last time, the request and parsing are complete. This time, it is a method to save the acquired data.

Save the acquired information

Scraping often doesn't end with a single URL You can save the data by storing the acquired information in a list type and outputting it to a file or database as appropriate.

import requests
from bs4 import BeautifulSoup

url = 'Access URL'
res = requests.get(url)
soup = BeautifulSoup(res.content, "html.parser")

#Prepare an empty list
result_list = []

#Get all a tags
a_tags = soup.find_all('a')
for a in a_tags[0:10]:
    #Store the href of the a tag in the list
    result_list.append(a.get('href'))

print(result_list)

['http://www.otupy.com', '/otu/', '/business/', '/global/', '/news/', '/python/', '/visits/', '/recruit/', '/vision/']

The following code can be used to file what is stored in the list.

with open('File Path','w') as _w:
    for row in result_list:
        _w.write('\t'.join(row))

How to download the file

Not only text information can be obtained by scraping. If the request destination is a file, you can get the file.

You can download the file with the following code.

import requests
import os

url = 'File URL'

#Extract the file name from the URL
file_name = os.path.basename(url)
print(file_name)

#Streaming access to the target URL
res = requests.get(url, stream=True)
if res.status_code == 200:
    print('file download start {0}'.format(file_name))
    #Write a file with bytecode
    
    with open(file_name, 'wb') as file:
        # chunk_Proceed with file writing for each size
        for chunk in res.iter_content(chunk_size=1024):
            file.write(chunk)
    print('file download end   {0}'.format(file_name))

To save as a file, once you make sure you can access it Write the response as a file.

Write little by little with res.iter_content (chunk_size = chunk size).

URL encoding

Special characters such as Japanese cannot be used in the URL. If you want to use Japanese for the URL when searching for Japanese You need to convert the string to a specific code (a string of symbols and alphanumeric characters).

Making a character string that can be used in a URL from Japanese is called ʻURL encoding`.

On the contrary, it is possible to convert a character string that has been ʻURL-encoded and become unreadable to a state where it can be read again. It's called ʻURL decoding.

python uses the ʻurllib` library.

** URL encoding ** ʻUrllib.parse.quote ('target string') `

** Decode ** ʻUrllib.parse.unquote ('target string') `

import urllib.parse

#URL encoding
st = 'Otsu py'
s_quote = urllib.parse.quote(st)
print(s_quote)

##Decode
d_quote = urllib.parse.unquote('%E4%B9%99py')
print(d_quote)

%E4%B9%99py Otsu py

Summary

Contains supplementary knowledge about scraping. Since it is a small amount, I think you can try it immediately.

Let's review the minutes up to yesterday.

27 days until you become an engineer

Author information

Otsu py's HP: http://www.otupy.net/

Youtube: https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw

Twitter: https://twitter.com/otupython

Recommended Posts

You will be an engineer in 100 days ――Day 71 ――Programming ――About scraping 2
You will be an engineer in 100 days ――Day 74 ――Programming ――About scraping 5
You will be an engineer in 100 days ――Day 73 ――Programming ――About scraping 4
You will be an engineer in 100 days ――Day 75 ――Programming ――About scraping 6
You will be an engineer in 100 days ――Day 70 ――Programming ――About scraping
You will be an engineer in 100 days ――Day 61 ――Programming ――About exploration
You will be an engineer in 100 days --Day 68 --Programming --About TF-IDF
You will be an engineer in 100 days ――Day 81 ――Programming ――About machine learning 6
You will be an engineer in 100 days ――Day 79 ――Programming ――About machine learning 4
You will be an engineer in 100 days ――Day 76 ――Programming ――About machine learning
You will be an engineer in 100 days ――Day 80 ――Programming ――About machine learning 5
You will be an engineer in 100 days ――Day 78 ――Programming ――About machine learning 3
You will be an engineer in 100 days ――Day 84 ――Programming ――About machine learning 9
You will be an engineer in 100 days ――Day 83 ――Programming ――About machine learning 8
You will be an engineer in 100 days ――Day 77 ――Programming ――About machine learning 2
You will be an engineer in 100 days ――Day 85 ――Programming ――About machine learning 10
You will be an engineer in 100 days --Day 65 --Programming --Probability 3
You will be an engineer in 100 days --Day 86 --Database --About Hadoop
You will be an engineer in 100 days ――Day 60 ――Programming ――About data structure and sorting algorithm
You will be an engineer in 100 days --Day 34 --Python --Python Exercise 3
You will be an engineer in 100 days --Day 31 --Python --Python Exercise 2
You become an engineer in 100 days ――Day 67 ――Programming ――About morphological analysis
You become an engineer in 100 days ――Day 66 ――Programming ――About natural language processing
You will be an engineer in 100 days ――Day 30 ―― Python ―― Basics of Python language 6
You will be an engineer in 100 days ――Day 25 ―― Python ―― Basics of Python language 2
You will be an engineer in 100 days --Day 29 --Python --Basics of the Python language 5
You will be an engineer in 100 days --Day 33 --Python --Basics of the Python language 8
You will be an engineer in 100 days --Day 26 --Python --Basics of the Python language 3
You will be an engineer in 100 days --Day 35 --Python --What you can do with Python
You will be an engineer in 100 days --Day 32 --Python --Basics of the Python language 7
You will be an engineer in 100 days --Day 28 --Python --Basics of the Python language 4
When you get an error in python scraping (requests)
You have to be careful about the commands you use every day in the production environment.