
Parse html with JSoup and produce Hiccup data structures: conversion, parsing, fragment parsing, and selecting.

A Clojure library designed to utilize JSoup to produce Hiccup data structures.

It also provides direct access to the underlying JSoup objects in case you find those more convenient.

Finally it provides convenience functions for JSoup's parse (full document and fragment) and select methods. Both Hiccup and JSoup object versions are provided.

Include in your project

JSoup is hosted in Clojars, so if you're using Leiningen, just add the dependency to your project.clj file.

[soupup "0.2.0"]


soupup.core contains several functions for use

Accepts html text and returns JSoup data structures.
e.g. (parse (slurp "http://www.google.com"))

Accepts html text and returns Hiccup data structures.
e.g. (parseup (slurp "http://www.google.com")) 

Accepts html fragment text and returns JSoup data structures.
e.g. (frag "<p>Hello World</p>")

Accepts html fragment text and returns Hiccup data structures.
e.g. (fragup "<p>Hello World</p>")

Accepts a JSoup data structure and a css selector and returns Jsoup
data structures.
e.g. (select (parse (slurp "http://www.google.com")) "img")

Accepts a JSoup data structure and a css selector and returns Hiccup
data structures.
e.g. (selectup (parse (slurp "http://www.google.com")) "img")

Convert JSoup data structures to Hiccup data structures.

The -preserve-whitespace versions of the functions will preserve the original whitespace, otherwise it will be normalized by JSoup.

At the time of this writing,

(selectup (parse (slurp "http://www.google.com")) "img")

returned the following...

  {:alt "Gloria E. Anzalda’s 75th Birthday",
   :border "0",
   :height "200",
   :src "/logos/doodles/2017/gloria-e-anzalduas-75th-birthday-6115361035386880-l.png",
   :title "Gloria E. Anzalda’s 75th Birthday",
   :width "500",
   :onload "window.lol&&lol()"}])

Running this back through Hiccup's html function yields the following.

<img alt="Gloria E. Anzalda’s 75th Birthday" border="0" height="200" 
     id="hplogo" onload="window.lol&amp;&amp;lol()" 
     title="Gloria E. Anzalda’s 75th Birthday" width="500" />

See http://jsoup.org/cookbook/extracting-data/selector-syntax information on Jsoup's css selectors.

Running the Tests

user=> (use 'soupup.test)
user=> (test-all)


