Spiderable documents (content in HTTP response)

Question

Spiderable documents (content in HTTP response)

cben opened this issue 11 years ago · 9 comments

Search engines, or Javascript-less frontends (links, curl) etc. should be getting a at least a plain-text version of the document content (in a text area). Bonus points for:

rendering links as links.
full server-side CodeMirror styling.

Either way needs node.js on the server.

Answer 1 · 2013-10-14T08:58:42.000Z

Also, would be nice to provide a simple REST interface for getting the markdown content instead of rendered html.

Answer 2 · 2014-02-16T08:42:49.000Z

Fallback editing flow for users without Javascript:
- Minimum: append a comment.
- Ideally: Wikipedia-like [Edit section] links.

Answer 3 · 2014-07-26T18:08:44.000Z

Try http://www.brombone.com/ ?
Hmm, $39/mo for 200 pages is too small IIUC (mathdown has ~300 docs now),
$129/mo for 50,000 Pages is enough.
But that's a bit expensive for me for a free site. Plus I don't want another proprietary part.
And with headless now supported by firepad, it should be very easy to implement myself.

Answer 4 · 2014-12-03T13:57:45.000Z

Optimize critical time-to-visible-text:
(some of these could be done without server side, but having a server simplifies some tradeoffs)

Make Firepad and CodeMirror JS async, start with read-only and/or unstyled text ASAP
(Issue: how to know Firepad and CodeMirror finished loaded? Might need RequireJS or similar to get a callback.)
- Initial text can come on client from Firepad snapshot via firebase REST api (because it's dead simple),
  or from server-side (which could use snapshot if it's fresh [enough] but can probably do full OT composition because Cloud Networks are Fast).
  Coming from server has the benefit that it can be styled without CM loading latency, and we need server-styled content anyway for the original goal that docs to be search-engine friendly (with headings structure, real links etc).
- It would be great if Firepad snapshots were eventually consistent so they could serve as server-side cache.
- I see no easy way to bootstrap client's firepad state from server's reconstruction, so the work will have to be duplicated.
Conversely, make sure client can start loading JS, connecting to Firebase and even editing before server response is complete.

Good reading on browser critical path: http://www.sitepoint.com/optimizing-critical-rendering-path/

Answer 5 · 2014-12-13T22:17:16.000Z

Test whether text can be read via Pocket, Instapaper, Android chrome's reader mode etc.

Answer 6 · 2015-06-04T16:13:30.000Z

If I'm considering server-side math rendering, SVG is dramatically smaller than HTML-CSS:
cben/CodeMirror-MathJax#35 (comment)

But since the main goal here is sematically appropriate HTTP response, I should at most send the MathML.
And probably none of that, since server-side math typesetting would slow down the critical path, and reduce server scalability.

Answer 7 · 2015-06-04T16:15:08.000Z

In other news, as discovered on #56, it's presently impossible to create Firepad.Headless instances without leaking memory and CPU. Need to discuss upstream.

Answer 8 · 2015-08-29T20:55:10.000Z

Idea inspired by TiddlyWiki and https://github.com/jldec/pub-server:
Ideally the generated HTML should include doc id and full firepad state — current text, revision number & outstanding local changes — such that it can be Save As...d and later opened offline, edited some more, Saved and opened again — and eventually seamlessly sync when we come online.

EDIT: this requires much deeper changes than generated HTML — firepad would need to constantly keep all its state inside the DOM.

Of course OT's automatic merge is not very safe for major disconnected editing.
But it feels like a great yardstick to measure "semantic completeness" of the served HTML.
Plus lays some groundwork for offline use (#51).
Plus would speed up client side firepad init compared to repeating the server's work of loading from firebase. (OTOH, client loading from scratch feels much more bug-proof.)

Answer 9 · 2022-12-17T17:14:33.000Z

As part of #172 I've given up on maintaining a dynamic server and moved to static hosting on Netlify.

Hmm, Netlify supports pre-rendering which might work, but do we want their servers to see the text every mathdown pages users access?
(Note their backend does see the ?doc=SECRET, so nothing stops them from seeing the text if they wanted. Is there a privacy difference between could and will?)

Anyway I'm closing this as it's not something I'll have for any time soon.

If someone still wants to work on this, ping me, we can reopen & discuss.