/remark-java

Remark is a HTML to Markdown library forked from https://bitbucket.org/OverZealous/remark (mercurial repository)

Primary LanguageJavaOtherNOASSERTION

Build Status

OverZealous Creations Remark

Remark is a library for taking (X)HTML input and outputting clean Markdown, Markdown Extra, or MultiMarkdown compatible text. The purpose of this conversion is mainly to allow for the use of client-side HTML GUI editors while retaining safe, mobile-device editable markdown text behind the scenes. It is recommended that the markdown text is stored, to reduce XSS attacks by code injection.

Example Usage Scenario

  • The user logs in from their desktop.
    • Adding some text, the user inputs into a full-featured GUI, such as Dojo's rich text editor, or any of these editors.
    • The webserver takes the generated HTML, which may contain a lot of bad HTML, depending on the browser, and passes it to Remark.
      • Remark passes the HTML to jsoup, to clean up the input text, which strips unsupported HTML tags (the text will remain).
      • Remark walks the generated DOM tree, and outputs clean, structured markdown text.
      • The markdown text is returned.
    • The webserver stores this markdown text for future display.
  • The user chooses to re-edit the HTML text from their desktop.
    • The webserver converts the Markdown back to HTML, and sends it to the client.
    • Repeat the steps above to save it.
  • The user later logs in from their mobile device.
    • Mobile devices often not support rich text editing through the web browser.
    • So, instead, render a plain text field with the raw markdown text.
    • Because markdown is relatively easy to read and edit, the user can make simple changes without struggling with hundreds of messy HTML tags.

Advanced Features

Remark can be configured to output extra functionality beyond straight markdown.

  • Markdown Extra tables or Multimarkdown tables (which add column spanning support), including a best-guess attempt at alignment (based on style or align attributes)
  • Reversal of various smart HTML entities or unicode characters:
    • “ (“) and ” (”) become "
    • ‘ (‘), ’ (’), and ' become '
    • &laquo; («) becomes <<
    • &raquo; (») becomes >>
    • &hellip; (…) becomes ...
    • &endash; (–) becomes --
    • &emdash; (—) becomes ---
  • Simplified hardwraps — A <br/> is converted to just a single linebreak, instead of (space)(space)(newline), common in most third-party markdown renderers
  • Autolinks — a link that has the same content as it's label (and starts with http or https) is simply rendered as is, like http://www.overzealous.com
  • Markdown Extra definition lists
  • Markdown Extra abbreviations
  • Markdown Extra header IDs
  • Fenced code blocks, using either Markdown Extra's format using ~~~, or Github's format using ```
  • Customization of allowed HTML tags - not really recommended.

The basic theory is that you match the extensions to your Markdown conversion library.

A Note on Forking:

Want to fork this project? Great! However, please note that I use hgflow to manage the develop-release cycle. If you are uncomfortable with that, that's fine, too! Just switch to the develop branch before working, or I won't be able to easily merge the changes back in.

Source code build is done via Gradle.

Dependencies

Remark depends on jsoup and Apache Commons Lang 3. If you want to use it from the command line, it also depends on Apache Commons CLI. Alternatively, you can download the standalone version of the Jar, which contains all the dependencies.

jsoup uses the MIT License, which is roughly comparable to the Apache 2.0 License used by Remark and the Apache dependencies.

During testing, Remark also depends on some additional libraries, which are automatically downloaded by the gradle build script.

License

Remark is released under the Apache 2.0 license.

Copyright 2011 OverZealous Creations, LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.