[Python] Extract only numbers from lists and character strings

I am a beginner of scraping who has just started scraping with Python. This time, make a note of how to extract only the numbers from the data extracted by scraping.

Purpose

Extract only numbers from the data extracted by scraping

Implementation

Extract only numbers using ** regular expressions **

Use the re module. Execute the following command from the terminal to install regex.

$ pip install regex

Next, let's import the ** re module ** into sample.py.

sample.py


import re

Now you are ready to use the re module.

Extract only numbers from strings

sample.py


import re

dt = "regnkrnfskrngkaer gksgnkq laergnnktga3r erfkl 400 registrants aelrkna erglknnarg"
num = re.sub("\\D", "", dt)
print(num)

# 400

Extract only numbers from the list

sample.py


import re

dt = ['regnkrnfskrngkaer', 'gksgnkq', 'laergnnktgar', 'erfkl', '400 registrants', 'aelrkna erglknnarg']
dt_str = ",".join(dt)
dt_num = re.sub("\\D", "", dt_str)
print(dt_num)

# 400

Extract only specific numbers from a list of mixed strings and numbers

Failure example

sample.py


import re

dt = ['regr111kaer', 'gg443nkq', 'laen56nktgar', 'er39fkl', '400 registrants', 'ael09narg']
dt_str = ",".join(dt)
dt_num = re.sub("\\D", "", dt_str)
print(dt_num)

# 111443563940009

Success story

sample.py


import re

dt = ['regr111kaer', 'gg443nkq', 'laen56nktgar', 'er39fkl', '400 registrants', 'ael09narg']
dt_pop = (dt.pop(4))
dt_num = re.sub("\\D", "", dt_pop)
print(dt_num)

# 400

reference https://qiita.com/sakamossan/items/161db7418ade037f6f3d https://qiita.com/ikanamazu/items/ba2a32a1a5924f3bd8e9

Recommended Posts

[Python] Extract only numbers from lists and character strings
3-3, Python strings and character codes
[Beginner] Extract character strings with Python
Extract strings from files in Python
# 5 [python3] Extract characters from a character string
Links and memos of Python character code strings
Extract "current date only" and "current date and time" with python datetime.
[Python beginner] Extract prefectures and cities from addresses (3 lines).
Manipulating Python character lists (arrays)
Extract only Python for preprocessing
Python canonical notation: How to determine and extract only valid date representations from input data
Read and use Python files from Python
About Python, from and import, as
Extract text from images in Python
[python] Extract text from pdf and read characters aloud with Open-Jtalk
[Introduction to Udemy Python3 + Application] 12. Indexing and slicing of character strings
Extract database tables with CSV [ODBC connection from R and python]
Tips: [Python] Randomly restore and extract an array from a fasta file
[Python] Summary of conversion between character strings and numerical values (ascii code)
Extract every n elements from an array (list) in Python and Ruby
Extract images from cifar and CUCUMBER-9 datasets
From Python to using MeCab (and CaboCha)
Manipulating Python character lists (arrays)
Python numbers, strings, list types (Python learning memo ①)
[Introduction to Udemy Python3 + Application] 11. Character strings
Porting and modifying doublet-solver from python2 to python3.
Python> Output numbers from 1 to 100, 501 to 600> For csv
Understand python lists, dictionaries, and so on.
[Python] (Line) Extract values from graph images
Use PIL in Python to extract only the data you want from Exif
Extract text from PowerPoint with Python! (Compatible with tables)
Python scraping Extract racing environment from horse racing site
Firebase: Use Cloud Firestore and Cloud Storage from Python
[Python] Find coordinates from two angles and distance
Load and execute command from yml in python
Ten Puzzle-Make 10 with only 4 numbers and 4 arithmetic operations
[Short sentence] [Python] Format and print lists and dictionaries
PHP and Python integration from scratch on Laravel
[Python] Chapter 02-02 Basics of Python programs (Handling of character strings)
Get only articles from web pages in Python
Generate and output plantuml object diagram from Python object
Extract data from a web page with Python
Extract images and tables from pdf with python to reduce the burden of reporting