Add WARC support
Opened this issue · 5 comments
Hi and thanks for the awesome library!
I was wondering if you were aware of this initiative:
https://github.com/gildas-lormeau/SingleFile
It has a CLI, so I guess it could be used as a backend for org-board.
Hi there, thank you for the link! I've not heard of SingleFile but it seems like a good fit for this package. I will look into adding support for it.
Just throwing this out there. I manage to get org-board to work with another program called Monolith that, similar to Singlefile, saves a webpage in one html file. You can probably adapt this for the cli of Singlefile too.
Basically I override the org-board's org-board-wget-call
to call my
own my/org-board-monolith-call
instead.
(defun my/org-board-monolith-call (path directory args site)
"Like `org-board-wget-call' but call monolith instead."
(make-directory (file-name-as-directory directory))
(let* ((filename (url-filename (url-generic-parse-url (car site))))
(domain (file-name-nondirectory (url-domain (url-generic-parse-url (car site)))))
(name (if (string-empty-p filename)
domain
(if (string-match "/$" filename)
(file-name-base (directory-file-name filename))
filename)))
(output-directory-option
(expand-file-name
(concat (file-name-sans-extension (file-name-nondirectory name)) ".html")
(file-name-as-directory directory)))
(output-buffer-name "org-board-monolith-call")
(process-arg-list (append (list "org-board-monolith-process"
output-buffer-name
path)
org-board-wget-switches
(list "-o")
(list output-directory-option)
args
site))
(monolith-process (apply 'start-process process-arg-list)))
(if org-board-wget-show-buffer
(with-output-to-temp-buffer output-buffer-name
(set-process-sentinel
monolith-process
'org-board-wget-process-sentinel-function))
(set-process-sentinel
monolith-process
'org-board-wget-process-sentinel-function))
monolith-process))
(advice-add 'org-board-wget-call :override #'my/org-board-monolith-call)
Then I put these in my init.el
(setq org-board-wget-program (executable-find "monolith"))
(setq org-board-wget-switches '("-IevjF"))
The switches will be passed to monolith
GNU wget supports the creation of WARC archives, since 2012. See announcement at https://lists.gnu.org/archive/html/info-gnu/2012-08/msg00002.html
Given that org-board uses wget, can we get WARC support cheaply by using org-board's WGET_OPTIONS property?
I've just started using org-board (and org-attachments generally). WARC and WGET_OPTIONS is something I'm keen to try soon.
I'm skeptical about various other archive packages like SingleFile (which has already been forked...). I suppose it depends what you are looking for in a file format:
-
If you just a single file which can easily be copied or moved (shared as an email attachment, say) then take your pick: SingleFile and WARC both manage that.
-
If you're looking for web browser support, they're all poor choices IMO.
- I'm unaware of any single-file archive format which is supported by common web browsers. Several browsers have devised their own format (e.g. MAFF) but none have caught on or been adopted by other browsers.
- Some formats have 3rd-party browser extensions, which could be good for personal use. A downside here is that it doesn't really help when you want to share the archive with somebody else; they'll have to go and find a browser extension too.
-
If your interest is longevity though, then I'd bet on WARC. It's an ISO standard with a detailed spec, and it has the backing of major national libraries and universities. It's been developed and maintained with proper archivists and librarians, who tend to think on a longer time scale than most software developers I've known.