cgrand/enlive

html-snippet doesn't work with Jsoup parser

tomoharu-fujita opened this issue · 5 comments

net.cgrand.enlive-html/html-snippet pass java.io.StringReader instance to html-resource, but Jsoup/parse doesn't come with a corresponding interface.

Same issue, here's a simple example that breaks:

(enlive/set-ns-parser! net.cgrand.jsoup/parser)
(enlive/html-resource (java.io.StringReader. "<h1>Hi, cgrand!</h1>") (enlive/ns-options))

The above returns this:

CompilerException java.lang.IllegalArgumentException: No matching method found: parse

This is seriously ruining my day today.

Here's a monkey patch workaround I use. Basically had to redefine a bunch of core functions and then modify the parser fn to suit my needs:

(ns my.namespace
  (:import [org.jsoup Jsoup]
           [org.jsoup.nodes Attribute Attributes Comment DataNode Document
                            DocumentType Element Node TextNode XmlDeclaration]
           [org.jsoup.parser Parser Tag]))

(def ^:private ->key (comp keyword #(.. % toString toLowerCase)))

(defprotocol IEnlive
  (->nodes [d] "Convert object into Enlive node(s)."))

(extend-protocol IEnlive
  Attribute
  (->nodes [a] [(->key (.getKey a)) (.getValue a)])

  Attributes
  (->nodes [as] (not-empty (into {} (map ->nodes as))))

  Comment
  (->nodes [c] {:type :comment :data (.getData c)})

  DataNode
  (->nodes [dn] (str dn))

  Document
  (->nodes [d] (not-empty (map ->nodes (.childNodes d))))

  DocumentType
  (->nodes [dtd] {:type :dtd :data ((juxt :name :publicid :systemid) (->nodes (.attributes dtd)))})

  Element
  (->nodes [e] {:tag     (->key (.tagName e))
                :attrs   (->nodes (.attributes e))
                :content (not-empty (map ->nodes (.childNodes e)))})

  TextNode
  (->nodes [tn] (.getWholeText tn))

  nil
  (->nodes [_] nil))

; redefined parser fn to support jsoup
(defn parser
  "Parse a HTML document stream into Enlive nodes using JSoup."
  [stream]
  (with-open [^java.io.Closeable stream stream]
    (->nodes (Jsoup/parse stream "ISO-8859-1" ""))))

; then this will work
(net.cgrand.enlive-html/html-resource (-> "<h1>Hi, cgrand!</h1>" (.getBytes "ISO-8859-1")
                                            java.io.ByteArrayInputStream.) {:parser parser})

Added to wiki, many thanks @dhruvbhatia !

@JustinIAC, net.cgrand.jsoup should be fixed for handling readers before making JSoup the default