release | |||
dev |
A Clojure micro-library for cached (ETag based) URL downloads. At its core, this library provides a single fn (urlocal.api/input-stream
) for reading the content of a URL, and will transparently cache downloaded content locally on disk (as per the XDG Base Directory Specification), serving subsequent requests for that same content out of that cache whenever possible. Because this content is persisted on disk, the cache survives restarts of the JVM.
Cached content is checked for staleness periodically via HTTP ETag GET requests, which are more efficient than a regular HTTP GET request in the event the cache is up to date, and the same speed if not. Within the checking interval, previously cached content is served straight from disk, with no network I/O at all. The staleness checking interval is configurable, and for applications that cannot tolerate any staleness, it can be set to 0 (meaning "make an ETag request to check for staleness on every request of the URL's content"). Despite not eliminating network I/O, this configuration is still more efficient than regular HTTP GET requests, especially for content that changes infrequently.
The library only has one non-core dependency - on clojure.tools.logging
, and is compatible with JVMs 1.8 and above (it uses the crusty old Java 1.1 HTTP client, rather than the vastly improved Java 11+ HTTP client).
While ETag-based caching logic is simple, well understood, and widely documented, the author thought it might be useful to centralise it in the interests of avoiding reinventing (small) wheels.
urlocal
is available as a Maven artifact from Clojars.
$ clj -Sdeps '{:deps {com.github.pmonks/urlocal {:mvn/version "RELEASE"}}}'
$ lein try com.github.pmonks/urlocal
$ deps-try com.github.pmonks/urlocal
(require '[urlocal.api :as url])
(def cache-dir (io/file (str (System/getenv "HOME") "/.cache/urlocal")))
;=> #'user/cache-dir
(.exists cache-dir)
;=> false
(time (url/input-stream "https://spdx.org/licenses/licenses.json"))
;=> "Elapsed time: 298.22525 msecs"
;=> #object[java.io.BufferedInputStream 0x4373f66f "java.io.BufferedInputStream@4373f66f"]
(.exists cache-dir)
;=> true
(map #(.getName %) (file-seq cache-dir))
;=> ("urlocal" "aHR0cHM6Ly9zcGR4Lm9yZy9saWNlbnNlcy9saWNlbnNlcy5qc29u.content" "aHR0cHM6Ly9zcGR4Lm9yZy9saWNlbnNlcy9saWNlbnNlcy5qc29u.metadata.edn")
(time (url/input-stream "https://spdx.org/licenses/licenses.json"))
;=> "Elapsed time: 1.294875 msecs"
;=> #object[java.io.BufferedInputStream 0x161dd92a "java.io.BufferedInputStream@161dd92a"]
API documentation is available here, or here on cljdoc.
This project uses the git-flow branching strategy, and the permanent branches are called release
and dev
. Any changes to the release
branch are considered a release and auto-deployed (JARs to Clojars, API docs to GitHub Pages, etc.).
For this reason, all development must occur either in branch dev
, or (preferably) in temporary branches off of dev
. All PRs from forked repos must also be submitted against dev
; the release
branch is only updated from dev
via PRs created by the core development team. All other changes submitted to release
will be rejected.
urlocal
uses tools.build
. You can get a list of available tasks by running:
clojure -A:deps -T:build help/doc
Of particular interest are:
clojure -T:build test
- run the unit testsclojure -T:build lint
- run the linters (clj-kondo and eastwood)clojure -T:build ci
- run the full CI suite (check for outdated dependencies, run the unit tests, run the linters)clojure -T:build install
- build the JAR and install it locally (e.g. so you can test it with downstream code)
Please note that the release
and deploy
tasks are restricted to the core development team (and will not function if you run them yourself).
Copyright © 2023 Peter Monks
Distributed under the Apache License, Version 2.0.
SPDX-License-Identifier: Apache-2.0