'<div>' inside '<a>' element is moved outside on parsing
rbf opened this issue · 7 comments
As discussed in this SO question, HTML 5 states that
the
<a>
element may be wrapped around entire paragraphs, lists, tables, and so forth, even entire sections, so long as there is no interactive content within (e.g. buttons or other links).
However, enlive
puts block content outside (i.e. after) the <a></a>
element. Here is a MWE:
;; Clojure 1.5.1
(require '[net.cgrand.tagsoup :as t])
;; => nil
(t/parser (java.io.StringReader. "<a href='#'>link<span>inline element</span></a>"))
;; => ({:tag :html, :attrs nil, :content ({:tag :body, :attrs nil, :content ({:tag :a, :attrs {:href "#"}, :content ("link" {:tag :span, :attrs nil, :content ("inline element")})})})})
(t/parser (java.io.StringReader. "<a href='#'>link<div>block element</div></a>"))
;; => ({:tag :html, :attrs nil, :content ({:tag :body, :attrs nil, :content ({:tag :a, :attrs {:href "#"}, :content ("link")} {:tag :div, :attrs nil, :content ("block element")})})})
Is that the intended behaviour?
I haven't found specific documentation or questions about that, so I'm raising a question here.
Might be related to the underlying TagSoup parser.
I'm also having trouble with this at the moment - any feedback regarding this?
After a bit more digging, I found this :
https://groups.google.com/forum/#!topic/tagsoup-friends/e4dUrzVpJjQ
Seems like it's here to stay..
Thanks for sharing. In my case, I ended up using span
s, which was equivalent for the purpose.
However, this is yet another reason to look forward to cgrand/enliven, which uses jsoup instead of TagSoup. ;)
Cool - I actually ended up finding hickory - https://github.com/davidsantiago/hickory - which parses html using jsoup and emits an enlive compatible data structure. I might look into enliven though :)
I found that there's an options var that can be rebound to whichever parser function you choose, so I did the following:
(defn parser
"Loads and parse an HTML resource and closes the stream."
[stream]
(filter map? (map h/as-hickory (h/parse-fragment (slurp stream)))))
(defmacro defsnippet [nm source selector args & forms]
`(binding
[net.cgrand.enlive-html/*options* {:parser parser}]
(en/defsnippet ~nm ~source ~selector ~args ~@forms)))
(defmacro deftemplate [nm source args & forms]
`(binding
[net.cgrand.enlive-html/*options* {:parser parser}]
(en/deftemplate ~nm ~source ~args ~@forms)))
I suppose I could just set the var root though.. anyhow - this works at the moment, will optimise down the line.
Thanks @CmdrDats, your hacked worked very nice here 👍
FYI, the hickory workaround doesn't appear to work if you're trying to modify :body. I haven't looked much further but some quick debugging indicates that as-hickory returns only elements inside <head>
and <body>
but not <head>
and <body>
themselves.
ETA:
After some experimentation, it seems that Crouton (https://github.com/weavejester/crouton) also works and enables transformations on :body also.
(defn crouton-parser
[s]
[(crouton/parse s)])
(en/set-ns-parser! crouton-parser)
You just have to specify that you want to use the jsoup parser and not the
tagsoup one. You can specify that either per template or once per ns.
Le mercredi 28 janvier 2015, Andrew Nguyen notifications@github.com a
écrit :
FYI, the hickory workaround doesn't appear to work if you're trying to
modify :body. I haven't looked much further but some quick debugging
indicates that as-hickory returns only elements inside
and but not and themselves.—
Reply to this email directly or view it on GitHub
#110 (comment).
On Clojure http://clj-me.cgrand.net/
Clojure Programming http://clojurebook.com
Training, Consulting & Contracting http://lambdanext.eu/