Turn multiple lists with a for statement at the same time in Python

Use zip.

From the list of articles on a site, use BeautifulSoup4 to get a list of URLs and titles. The purpose is to create a list of URLs and titles connected by >>>, which is ['url >>> title','url >>> title'].

If the page has a title in the a tag element as shown below ...


import requests, bs4

res = requests.get('https://qiita.com/takuto_neko_like')
posts = bs4.BeautifulSoup(res.text, 'html.parser').select('.u-link-no-underline')
print(posts)

The variable posts contains<class'bs4.element.Tag'>, and you can see the contents of the a tag in html by accessing it individually as posts [0].

The title is described in the element of the a tag


[<a class="u-link-no-underline" href="/takuto_neko_like/items/52c6c52385386544aa62">Where I was worried about heroku</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/c5791f267e0964e09d03">I made a tool to get new articles</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/93b3751984e5e3fd3670">fish is moving too slowly git trouble</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/62aeb4271614f6f0347f">Use Plotly graphs with Django</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/c9c80ff453d0c4fad239">【Python】super()Reasons to override using</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/14e92797fa2b23a64adb">[Python] What is inherited by multiple inheritance?</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/6cf9bade3d9515a724c0">【Python】@What are classmethods and decorators?</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/aed9dd5619d8457d4894">【Python】*args **What is kwrgs</a>, <a class="u-link-no-underline" href="/takuto_neko_like/items/bb8d0957347636b5bf4f">[Bootstrap] How to fix and display navbar even if scrolling, points to keep in mind and solutions</a>]

The content of each a tag is <class'bs4.element.Tag'> = Tag object, so by converting it to a string type with str (), you can use .find () with the `tag name and attributes. By using the index obtained by specifying the name etc., it will be possible to extract only the character string of the URL part and title part.

Format url and title



    for post in posts:
        #Extract URL
        index_first = int(str(post).find('href=')) + 6
        index_end = int(str(post).find('">'))
        url = (str(post)[index_first : index_end])
        #Extract title
        index_first = int(str(post).find('">')) + 2
        index_end = int(str(post).find('</a'))
        title = (str(post)[index_first : index_end].replace('\u3000', ' '))

        url_title_set.append(f"{url}>>>{title}")

If so, it's done.

However, there are many type ** sites that do not have a title listed as an element of the ** a tag. For example, a pattern in which article information is displayed as a card consisting of an image and a title, and a link is attached to the entire card.

Example



    <div class='card'><a href='#' class='link'>
        <div class='image'><img src='#'></div>
        <div class='title'>title</div>
        </a>
    </div>

In such a case, if you specify the class card in .select of bs4, you can get in the div tag to which the card class is applied. I want to get the href information of the a tag and the element of the title class div from there.

In the actual code, there are more elements overlapping, so trying to find a particular string in .find from the parent element can be a bit annoying.

Also, bs4 has a way to access the child elements, but it seemed a bit annoying when I was looking at the docs, so I got each one individually as follows.


posts_links = bs4.BeautifulSoup(res.text, 'html.parser').select('.link')
posts_titles = bs4.BeautifulSoup(res.text, 'html.parser').select('.title')

We will access the Tag object individually by turning the posts list with a for statement by formatting the previous code ʻurl and title, but this time there are two lists. Turn the two lists at the same time with a for statement and combine the url and title obtained from each list. Then I want to store it in a new list.

In this way, if you want to rotate multiple lists with a for statement at the same time, use zip.


now_posts_link_title_set = []

for (posts_link, posts_title) in zip(posts_links, posts_titles):
        index_first = int(str(posts_link).find('href=')) + 6
        index_end = int(str(posts_link).find('">'))
        posts_link_set = (str(posts_link)[index_first : index_end])

        index_first = int(str(posts_title).find('h2')) + 3
        index_end = int(str(posts_title).find('</h2'))
        posts_title_set = (str(posts_title)[index_first : index_end].replace('\u3000', ' ')) #Whitespace replacement
        now_posts_link_title_set.append(f"{posts_link_set}>>>{posts_title_set}")

It's okay to have more than two

for (a, b, c, d) in zip(a_list, b_list, c_list, d_list)

If there is a difference in the number of elements in the list, the larger one will be ignored

aa = [1,2,3,4,5]
bb = ['a', 'b', 'c']

for (a, b) in zip(aa, bb):
  print(f'{a} : {b}')

#result
1 : a
2 : b
3 : c

This is convenient

Recommended Posts

Turn multiple lists with a for statement at the same time in Python
Process multiple lists with for in Python
python memo: enumerate () -get index and element of list at the same time and turn for statement
How to define multiple variables in a python for statement
Turn an array of strings with a for statement (Python3)
Change the list in a for statement
[Introduction to Python] How to use the in operator in a for statement?
MongoDB for the first time in Python
[Introduction to Udemy Python3 + Application] 47. Process the dictionary with a for statement
Get a datetime instance at any time of the day in Python
A useful note when using Python for the first time in a while
Set up a server that processes multiple connections at the same time
Plot multiple maps and data at the same time with Python's matplotlib
Use logger with Python for the time being
Loop variables at the same time in the template
Count the number of times two values appear in a Python 3 iterator type element at the same time
Execution order when multiple context managers are specified in the Python with statement
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
Run with CentOS7 + Apache2.4 + Python3.6 for the time being
Register a task in cron for the first time
Automate background removal for the latest portraits in a directory with Python and API
Create a child account for connect with Stripe in Python
[Understand in the shortest time] Python basics for data analysis
Create a Twitter BOT with the GoogleAppEngine SDK for Python
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①
How to turn the for statement when there are multiple values for one key in the dictionary
[Python] What is a with statement?
See python for the first time
[Python 3.8 ~] Rewrite arrays etc. at the same time as definition [tips]
Create a color picker for the color wheel with Python + Qt (PySide)
The story of making a standard driver for db with python.
A function that measures the processing time of a method in python
Read a file in Python with a relative path from the program
Dockerfile with the necessary libraries for natural language processing in python
Browse .loc and .iloc at the same time in pandas DataFrame
Solve the subset sum problem with a full search in Python
Take a look at the built-in exception tree structure in Python 3.8.2
Try using FireBase Cloud Firestore in Python for the time being
Put the process to sleep for a certain period of time (seconds) or more in Python
What I learned by writing a Python Pull Request for the first time in my life
Spiral book in Python! Python with a spiral book! (Chapter 14 ~)
A memo organized by renaming the file names in the folder with python
Create a Python development environment locally at the fastest speed (for beginners)
Write the test in a python docstring
Tips for dealing with binaries in Python
Python Master RTA for the time being
How to get the date and time difference in seconds with python
Display Python 3 in the browser with MAMP
Search the maze with the python A * algorithm
Read a Python # .txt file for a super beginner in Python with a working .py
Run the Python interpreter in a script
Call the python debugger at any time
Get and convert the current time in the system local timezone with python
Receive a list of the results of parallel processing in Python with starmap
Create a record with attachments in KINTONE using the Python requests module
How to read standard input or variable files at the same time like paste command in Python
[Python] How to open two or more files at the same time
Use something other than a <br> string for the <br> dict key in Python
Try image processing with Python when asked for entertainment at a wedding ceremony