When the link destination is described as a relative path in HTML, use ʻurllib.parse.urljon () `to get the absolute path.
from urllib.parse import urljoin
base = "http://exsample.com/html/a.html"
compurl = lambda q: print(urljoin(base,q))
compurl("b.html")
compurl("sub/c.html")
compurl("../index.html")
compurl("../img/hoge.png ")
Execution result
http://example.com/html/b.html http://example.com/html/sub/c.html http://example.com/index.html http://example.com/img/hoge.png
pass
statementIn Python, processing blocks are defined by indentation, so when there is no need to perform processing, the processing block itself disappears. Therefore, the pass
statement is used to explicitly describe that nothing is to be done.
with open('exsample.txt', 'w'):
pass
Describe using ʻif, ʻelif
, ʻelse`. In addition, when branching into two processing blocks by one conditional expression, it can be described using a ternary operator.
if conditional expression 1:
<Processing block 1>
#Processing executed when conditional expression 1 is True
elif conditional expression 2:
<Processing block 2>
#Processing executed when conditional expression 1 is False and conditional expression 2 is True
else:
<Processing block 3>
#Processing executed when conditional expression 1 is False, conditional expression 2 is False, and conditional expression 3 is True.
# Conditional branching by ternary operator
# Value 1 when the conditional expression is True, value 2 when False
Value 1 if conditional expression else value 2
Hold the data as a rule that can generate a series of data, not the value itself that represents the data. When a large amount of data is required, iterators are more memory efficient. A description example of a range
type iterator is shown below.
# The following three range () return iterators increasing by 1 from 0 to 9.
range(0, 10, 1)
range(0, 10)
a = range(10)
print(a)
print(a[0])
Execution result
range(0, 10) 0
range ()
has the function of the increment operator (++
) which cannot be used in Python. Decrement (--
) is done using an iterator called reversed ()
.
break
and continue
statementsBy writing in the processing block of the loop statement, the running loop statement can be controlled.
break
statementBreak the currently running loop and exit the loop.
b = 0
while True:
b += 1
if b > 5
When break # b = 6, the loop is exited and the process ends.
print(b)
Execution result
1 2 3 4 5
continue
statementSuspends the processing block being executed and moves the processing to the conditional expression evaluation of the loop.
c = 0
while True:
c += 1
if c < 5
The subsequent processing will not be executed until continue # b = 6.
print(c)
break
Execution result
6
I started having a hard time understanding the code in the scraping reference book I was using, so I decided to start learning the basic grammar of Python again. I was surprised that increment decrement, which was very convenient in C
and Java
, cannot be used, but it can be handled well by using cumulative assignment (+ =
, -=
) and iterators. I want to do it.
I have attached the GitHub published from the book I referred to. Supplementary revision Python scraping & machine learning development technique
Recommended Posts