TL;DR
feedparser.parse ()
is to look at the return value bozo
. Perspective succeeds only when 0bozo_execption
>>> import feedparser
>>>
>>> resp1 = feedparser.parse('http://qiita.com/tags/python/feed')
>>> type(resp1)
<class 'feedparser.FeedParserDict'>
>>>
>>> resp2 = feedparser.parse('http://qiita.com/tags/python1/feed')
>>> type(resp2)
<class 'feedparser.FeedParserDict'>
feedparser.parse ()
returns feedparser.FeedParserDict
whatever the specified URL.
That's a little inconvenient, so I decided to take a look at the contents.
Variable name | URL | |
---|---|---|
resp1 | http://qiita.com/tags/python/feed | Ordinary RSS feed |
resp2 | http://qiitta.com/tags/python | Not an RSS feed |
resp3 | http://qiita.com/tags/python1/feed | Status code other than 200 |
resp4 | http://qiitta.com/tags/python/feed | Domain that does not exist |
>>> resp1.keys()
dict_keys(['bozo', 'encoding', 'status', 'etag', 'href', 'entries', 'version', 'namespaces', 'feed', 'headers'])
>>>
>>> resp2.keys()
dict_keys(['bozo', 'encoding', 'status', 'bozo_exception', 'etag', 'href', 'entries', 'version', 'namespaces', 'feed', 'headers'])
>>>
>>> resp3.keys()
dict_keys(['bozo', 'encoding', 'bozo_exception', 'status', 'href', 'entries', 'version', 'namespaces', 'feed', 'headers'])
>>>
>>> resp4.keys()
dict_keys(['bozo', 'entries', 'feed', 'bozo_exception'])
>>>
Unexpectedly, the keys I hold are different. Only bozo
, ʻentries` are common.
>>> resp1.bozo
0
>>> resp1.entries
[{'summary': '<p>Chapter 8 describes the graphical model. A graphical model is a method of graphically expressing relationships such as random variables and model parameters.
#Abbreviation
}]
>>>
>>> resp2.bozo
1
>>> resp2.version
''
>>> resp2.entries
[]
>>>
>>> resp3.bozo
1
>>> resp3.version
''
>>> resp3.entries
[]
>>>
>>> resp4.bozo
1
>>> resp4.entries
[]
>>>
At this point, only when ** bozo
is 0, it can be considered that the RSS feed was successfully parsed **.
If you look only at the original main purpose, this is almost complete, but let's dig a little deeper.
You can play with ʻentries` without thinking about the result of successful parsing, so from here on, I will go to see what kind of information can be obtained for the purpose of error handling of parsing failure.
If you compare resp1 and resp2, you can see HTTP-like keys such as status
and headers
, probably because both requests were successful.
Meanwhile, there was a key that exists only in resp2. That is bozo_exception
.
>>> resp2.bozo_exception
SAXParseException('undefined entity',)
It contained that decent message. Looking at this, it seems that there is almost no problem.
resp3,resp4
>>> resp3.bozo_exception
NonXMLContentType('text/html; charset=utf-8 is not an XML media type',)
>>>
>>> resp4.bozo_exception
URLError(gaierror(8, 'nodename nor servname provided, or not known'),)
>>>
Looking at other resps, it looks like this. The disadvantage is that it is troublesome to put out only the character string inside.
Also, resp3
is trying to parse despite 404 Not Found
, so if you have status, it's a good idea to look at it as well.
Recommended Posts