(Python) HTML reading and regular expression notes

When applying a regular expression to an HTML file in Python I've used it in the encoding and other places, so I'll leave it as a memo.

I. Read HTML file

Use the codecs library. Because it's a Python standard library(Appx. 1), It can be used only by import without installation.



import codecs f = codecs.open("hoge.html","r", encoding="utf-8")

### Caution
 --Be sure to specify the ```encoding``` argument to the codecs.open function.
 --Under Windows environment, it seems that you can not read files other than those specified as Shift-JIS. There was a predecessor when I was supporting (App x. 2).

## II. Apply regular expressions


#### **`Let's use the re library. This is also a standard library(Appx. 1)So it can be used only by import.`**
``` 1)So it can be used only by import.


(Continuation of the previous chapter)

import re str = f.read() regex = '[abc]' sample = re.findall(regex, str)

### Caution
 --Be sure to use the `` `codecs.read ()` `` function before passing the html file read in the previous chapter to each function of ``` re```.

### Typical regular expression

 The following is a list of functions that seem to be highly versatile as `` `re``` functions from Appx.3. It is assumed that `` `regex``` contains a regular expression.

 --Forward search: `` `re.search (regex, string) ``` `
 --Checks if the regex pattern is in string and returns the string (= regex) if it exists
 --If the search fails, None is returned, so you can easily use it for conditional branching with, for example, ```if not re.search (regex, string): `` `.
 --Search all: `` `re.findall (regex, string) ``` `
 --Checks if the regex pattern is in a string and returns a list with all the matches
 --If you want to perform another re process from this function, use the str function (to make it a string) to make it a string (App x. 4).
 --Replacement: `` `re.sub (regex, replace, string, count = 0) ``` `
 --Check if the regex pattern is in the string and replace it with replace
 --If you enter a natural number of 1 or more in count, you can specify how many times to replace the regex corresponding part from the beginning of the string. If the default value is 0, replace all parts
 ――I was wondering, "How do you express the global search (` `/ g```) in Python when writing a regular expression in JavaScript etc.? ", But apparently this count = 0 state It seems to be expressed by (App x. 5)

 The regular expression itself is detailed in App x. 6.

 (that's all)

## Reference (Appendix / Appx.)
 ―― 1. [Python standard library](https://docs.python.org/ja/3/library/index.html)
 ―― 2. [A story that I had a hard time opening a file other than CP932 (Shift-JIS) encoded on Windows](https://qiita.com/Yuu94/items/9ffdfcb2c26d6b33792e)
 --3 [re --- Regular expression operation](https://docs.python.org/ja/3/library/re.html)
- 4. [re.sub erroring with “Expected string or bytes-like object”](https://stackoverflow.com/questions/43727583/re-sub-erroring-with-expected-string-or-bytes-like-object)
- 5. [Python RegExp global flag
](https://stackoverflow.com/questions/11686516/python-regexp-global-flag)
 --6 [List of basic regular expressions](https://murashun.jp/blog/20190215-01.html)
 ――I always use it to look up regular expressions.


Recommended Posts

(Python) HTML reading and regular expression notes
Python 處 處 regular expression Notes
python regular expression memo
Regular expression in Python
Regular expression in Python
Regular expression manipulation with Python
About Python and regular expressions
Python regular expression basics and tips to learn from scratch
Notes on reading and writing float32 TIFF images in python
Notes on Python and dictionary types
View photos in Python and html
String replacement with Python regular expression
Python CSV file reading and writing
Reading and writing NetCDF with Python
Reading and writing CSV with Python
(Personal notes) Python metaclasses and metaprogramming
Reading and writing text in Python
A python regular expression, str and unicode that are sober and addictive
Overlapping regular expressions in Python and Java
Notes using cChardet and python3-chardet in Python 3.3.1.
Start / end match in python regular expression
Reading and writing JSON files with Python
[python] Read html file and practice scraping
URL match checking and extraction with python regular expression Regex Full version
Solve with Ruby, Perl, Java and Python AtCoder ABC 047 C Regular Expression
Read the file with python and delete the line breaks [Notes on reading the file]
Python study notes _000
Python learning notes
Notes on installing Python3 and using pip on Windows7
Regular expression Greedy
Python beginner notes
Study from Python Reading and writing Hour9 files
Receive and display HTML form data in Python
Python study notes_006
python C ++ notes
[Python] Regular Expressions Regular Expressions
Python study notes _005
Python grammar notes
Python Library notes
Reading and writing fits files with Python (memo)
python personal notes
Regular expression symbolic group name in Python / Ruby
python pandas notes
Python study notes_001
python learning notes
Regular expression re
Python3.4 installation notes
Example of reading and writing CSV with Python
[Python] AGC043A (Problem reading comprehension and DP) [At Coder]
Python text reading for multiple lines and one line
Notes on HDR and RAW image processing with Python
[Introduction for beginners] Reading and writing Python CSV files
A python regular expression, or a memo of a match object
Reading, displaying and speeding up gifs with python [OpenCV]
[python] Compress and decompress
Regular expression in regex.h
Python and numpy tips
Get the matched string with a regular expression and reuse it when replacing on Python3
[Python] pip and wheel
missingintegers python personal notes
[Note] File reading ~ Python ~