Parsing enc:enclosure
Opened this issue · 2 comments
ahoglund commented
I was attempting to parse an RSS doc with
<enc:enclosure resource="http://image_url" type="image/jpeg"/>
<item rdf:about="https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html">
<title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) $850]]></title>
<link>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</link>
<description><![CDATA[SELLING MY BELOVED HONDA FOR PARTS OR PERSONAL PROJECT. ENGINE IN GREAT SHAPE HAS 165,000 ML MAINTATINED REALLY WELL, CLEAN TITLE. BRAND NEW FRONT SUSPENSION, GOOD BREMBOO BRAKES, TIRES IN GOOD SHAPE, EVERYTHING WORKS INSIDE CAR, FRONT SEAT BIT TORN. ...]]></description>
<dc:date>2020-04-16T10:51:41-05:00</dc:date>
<dc:language>en-us</dc:language>
<dc:rights>copyright 2020 craigslist</dc:rights>
<dc:source>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</dc:source>
<dc:title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) $850]]></dc:title>
<dc:type>text</dc:type>
<enc:enclosure resource="https://images.craigslist.org/00h0h_fyqO7icY0vm_300x300.jpg" type="image/jpeg"/>
<dcterms:issued>2020-04-16T10:51:41-05:00</dcterms:issued>
</item>
Even with passing ignore_unknown_element
to the parse
method, I could not find these in the parsed results. Does this library not support these enclosures, and if not is there a plan/willingness to have it added? Enclosures seem to be a standard RSS feature: https://en.wikipedia.org/wiki/RSS_enclosure
kou commented
Could you provided a full RSS?
ahoglund commented
Sure! Just visit this link for example: https://newyork.craigslist.org/search/cta?format=rss