CSS parsing with cssutils in Python

A note when scraping a website with Python and needing to parse the CSS written inline in the style attribute of the tag.

Library selection

Beautiful Soup doesn't seem to be able to handle CSS, so look for a library that meets your requirements.

I searched on PyPI and decided to use the top cssutils. Documentation is written properly, and it seems that development is continuing, so it looks good.

This time I tried it in the environment of Python 3.3.3. Installation is one shot with pip.

$ python -V
Python 3.3.3
$ pip install cssutils

CSS perspective

This time we will parse inline CSS, so we will use cssutils.parseStyle. There are various interfaces for parsing, and although I haven't tried it this time, it seems that you can also parse by specifying the file name and URL. You can also specify the character code with an optional argument.

>>> from cssutils import parseStyle
>>> style = parseStyle('width: 300px; margin: 0 20px 0 10px;')
>>> type(style)
<class 'cssutils.css.cssstyledeclaration.CSSStyleDeclaration'>

Parsing inline CSS gives you an object of class cssutils.css.CSSStyleDeclaration. What we want to do this time is to get the values specified by the width and margin properties from here.

Get properties and values

It's easy to get the value of a property as a string.

>>> style.width
'300px'
>>> style.margin
'0 20px 0 10px'

Use the objects of the cssutils.css.Property and cssutils.css.PropertyValue classes when you want to analyze in a little more detail, such as when the value is composed of multiple elements or when you want to consider the unit.

>>> p = style.getProperty('margin')
>>> type(p)
<class 'cssutils.css.property.Property'>
>>> v = p.propertyValue
>>> type(v)
<class 'cssutils.css.value.PropertyValue'>

The cssutils.css.PropertyValue class can handle values that consist of multiple elements individually.

>>> v.length
4
>>> v[0]
cssutils.css.DimensionValue('0')
>>> v[1]
cssutils.css.DimensionValue('20px')

Each element of the value can be obtained by a list-like operation. This time, an object of the cssutils.css.DimensionValue class is returned. This class can handle units such as px and ʻem`.

>>> v[1].value
20
>>> v[1].dimension
'px'
>>> v[1].cssText
'20px'

There are other classes such as cssutils.css.ColorValue and cssutils.css.URIValue, and it seems that the appropriate object is generated depending on the value format.

Summary

--You can parse CSS in Python by using cssutils. --You can easily handle values that consist of multiple elements or values with units.

Recommended Posts

CSS parsing with cssutils in Python
Achieve scraping with Python & CSS selector in 1 minute
Working with LibreOffice in Python
Scraping with chromedriver in python
Working with sounds in Python
Scraping with Selenium in Python
Scraping with Tor in Python
Tweet with image in Python
Combined with permutations in Python
Testing with random numbers in Python
GOTO in Python with Sublime Text 3
Working with LibreOffice in Python: import
Scraping with Selenium in Python (Basic)
Numer0n with items made in Python
Open UTF-8 with BOM in Python
Use rospy with virtualenv in Python3
Parsing Subversion commit logs in Python
Use Python in pyenv with NeoVim
Heatmap with Dendrogram in Python + matplotlib
Read files in parallel with Python
Password generation in texto with python
Parsing Git commit logs in Python
Use OpenCV with Python 3 in Window
Until dealing with python in Atom
Get started with Python in Blender
Working with DICOM images in Python
Write documentation in Sphinx with Python Livereload
Get additional data in LDAP with python
Spiral book in Python! Python with a spiral book! (Chapter 14 ~)
Try logging in to qiita with Python
Python3> in keyword> True with partial match?
Exclusive control with lock file in Python
Device monitoring with On-box Python in IOS-XE
Try working with binary data in Python
Draw Nozomi Sasaki in Excel with python
Tips for dealing with binaries in Python
Display Python 3 in the browser with MAMP
Page cache in Python + Flask with Flask-Caching
Post Test 3 (Working with PosgreSQL in Python)
How to work with BigQuery in Python
Playing card class in Python (with comparison)
Dealing with "years and months" in Python
Process multiple lists with for in Python
Replace non-ASCII with regular expressions in Python
Connect with mysql.connector with ssh tunnel in Python 3.7
One liner webServer (with CGI) in python
Get Started with TopCoder in Python (2020 Edition)
Easy image processing in Python with Pillow
To work with timestamp stations in Python
Call APIGateWay with APIKey in python requests
Read text in images with python OCR
Introduced sip-4.14 in python3.2.2 environment with MacOS 10.7.4
Quadtree in Python --2
Python in optimization
CURL in python
FizzBuzz with Python3
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Scraping with Python