html-snippet doesn't work with Jsoup parser
tomoharu-fujita opened this issue · 5 comments
tomoharu-fujita commented
net.cgrand.enlive-html/html-snippet pass java.io.StringReader instance to html-resource, but Jsoup/parse doesn't come with a corresponding interface.
dhruvbhatia commented
Same issue, here's a simple example that breaks:
(enlive/set-ns-parser! net.cgrand.jsoup/parser)
(enlive/html-resource (java.io.StringReader. "<h1>Hi, cgrand!</h1>") (enlive/ns-options))
The above returns this:
CompilerException java.lang.IllegalArgumentException: No matching method found: parse
jcromartie commented
This is seriously ruining my day today.
dhruvbhatia commented
Here's a monkey patch workaround I use. Basically had to redefine a bunch of core functions and then modify the parser
fn to suit my needs:
(ns my.namespace
(:import [org.jsoup Jsoup]
[org.jsoup.nodes Attribute Attributes Comment DataNode Document
DocumentType Element Node TextNode XmlDeclaration]
[org.jsoup.parser Parser Tag]))
(def ^:private ->key (comp keyword #(.. % toString toLowerCase)))
(defprotocol IEnlive
(->nodes [d] "Convert object into Enlive node(s)."))
(extend-protocol IEnlive
Attribute
(->nodes [a] [(->key (.getKey a)) (.getValue a)])
Attributes
(->nodes [as] (not-empty (into {} (map ->nodes as))))
Comment
(->nodes [c] {:type :comment :data (.getData c)})
DataNode
(->nodes [dn] (str dn))
Document
(->nodes [d] (not-empty (map ->nodes (.childNodes d))))
DocumentType
(->nodes [dtd] {:type :dtd :data ((juxt :name :publicid :systemid) (->nodes (.attributes dtd)))})
Element
(->nodes [e] {:tag (->key (.tagName e))
:attrs (->nodes (.attributes e))
:content (not-empty (map ->nodes (.childNodes e)))})
TextNode
(->nodes [tn] (.getWholeText tn))
nil
(->nodes [_] nil))
; redefined parser fn to support jsoup
(defn parser
"Parse a HTML document stream into Enlive nodes using JSoup."
[stream]
(with-open [^java.io.Closeable stream stream]
(->nodes (Jsoup/parse stream "ISO-8859-1" ""))))
; then this will work
(net.cgrand.enlive-html/html-resource (-> "<h1>Hi, cgrand!</h1>" (.getBytes "ISO-8859-1")
java.io.ByteArrayInputStream.) {:parser parser})
fdserr commented
Added to wiki, many thanks @dhruvbhatia !
cgrand commented
@JustinIAC, net.cgrand.jsoup should be fixed for handling readers before making JSoup the default