[PYTHON] Extract the element by deleting the tag contained in the string

def remove_select_tags(string, start_tag, end_tag):
    start = string.find(start_tag)
    while start != -1:
        end = string.find(end_tag, start)
        string = string[:start] + " " + string[end + 1:]
        start = string.find(start_tag)
    return string.split()

def test_case():
    target_string = '''<h1>Title</h1><p>This is a
                        <a href="mt-takao.top">link</a>.<p>'''
    assert remove_select_tags(target_string, '<', '>') == ['Title', 'This', 'is', 'a', 'link', '.']
    target_string = "[test]a-I-U-E-O[test][next]Kakikukeko[next]"
    assert remove_select_tags(target_string, '[', ']') == ['a-I-U-E-O', 'Kakikukeko']
    print('test ok')
test_case()

Deletes the element of the specified character in the character string and retrieves the element. If the element is not found, -1 is returned, otherwise it is executed. Connect the part before the index found by start with a blank and connect it with the character string after the index found by end. Then look for more start_tags in the newly generated string. Finally, separate the spaces with commas to finish.

Recommended Posts

Extract the element by deleting the tag contained in the string
Get the last element of the array by splitting the string in Python and PHP
Shift the alphabet string by N characters in Python
Get the query string (query string) in Django
What's in the parameter? Edit String & Expression
Read the file line by line in Python
Read the file line by line in Python
I can't get the element in Selenium!
Escape curly braces in the format string
How to read all the classes contained in * .py in the directory specified by Python
Divides the character string by the specified number of characters. In Ruby and Python.
Extract only the file name excluding the directory in the directory
[Cloudian # 7] Try deleting the bucket in Python (boto3)
[Automation] Extract the table in PDF with Python
BeautifulSoup trick: Decide the Tag by specifying the path
Extract lines containing a specific "string" in Pandas
Search by the value of the instance in the list
json.dumping None in python returns the string null
Split camel case string word by word in Python
Extract the status code error in the 400,500 range of the apache access log by specifying the time range.