shurcooL/markdownfmt

SmartyPants-Unicode handling of quotes, dashes, and ellipses

jlevy opened this issue · 4 comments

jlevy commented

John Gruber's original Markdown has often been used with a long-standing hack called SmartyPants to improve typographic consistency on quotes, dashes, and ellipses. Python's Markdown package also implements it here.

A variant of this, that converts ASCII quotes, dashes, and ellipses to their appropriate Unicode equivalents, could be helpful in markdownfmt. Instead of converting to HTML entities, it would convert to Unicode, and then the Markdown doc would be consistent (including on GitHub, which does not by default do smarty-style conversion).

This is just another feature to note and discuss/consider. Inconsistent typographic usage is yet another pain point I've seen with large-scale collaborative Markdown.

It could also be implemented as a separate tool, perhaps. Gruber notes some (rare) algorithmic shortcomings like

'Twas the night...  -> ‘Twas the night...

But it's worth remembering authors can avoid that by using the correct oriented quotes in the original:

’Twas the night...

Hi @jlevy,

Thanks for the issue. This is a valid point for discussion.

I should first say that I'm aware of the smarty-pants option that blackfriday offers, but so far, I choose to ignore it. I already think markdown is way too complicated, and I want to make it as simple as possible.

So I currently use plain ASCII quotes everywhere and have very little desire in them to become something else.

This is just another feature to note and discuss/consider.

I'm not dismissing it completely and happy to discuss it, but realistically, I think it would have to be another project (possibly a fork) that takes this on. I think this is an interesting idea, and a tool like this has opportunities.

I just wanted to acknowledge this issue, and I'm ok with it staying open and having a discussion, but I don't plan to spend much time on this, and I'm unlikely to be able to accept PRs that implement this (please let me know before working on anything). There are other things that are occupying my budget for time and attention for now.

Thanks!

Hi,

I agree with @jlevy, this is a much needed feature. Basically, people using markdownfmt are looking for a markdown-tidy tool to keep their markdown source files clean — not only for aesthetic reasons, but mostly to avoid diffing nightmares and problems with Git (tidy sources make it easier to view what really changed in a commit).

I was already looking into trying to implement this feature, but as far as I've understood blackfriday's smartypants is for html rendering only.

As @jlevy pointed out, it would be nice to have UTF-8 Unicode chars, instead of HTML entities.

Any tips on how I could try to implement this? (I'm fresh in Go lang)

Thanks

people using markdownfmt are looking for a markdown-tidy tool to keep their markdown source files clean

I've touched on that point in #34 (comment), so I won't repeat that here.

I'm still not convinced that this is a great thing.

I haven't looked at any of the implementation matters here, but I suspect it might be tricky and probably not very clean.

At this time, my recommendation would be to experiment with a prototype and consider it going into a fork or separate tool. If the prototype works well, I would consider pulling it it, but I don't expect that at this time.

At this stage of this project, I'm not looking to grow the feature set of markdownfmt, instead, I want to keep it as simple as possible while still being viable to use. And in the last few years of using Markdown, I have had very little desire to have non-ascii quotes.

Makes sense. I guess that the issue really should be opened on GitHub, asking for a markdown previewer that implements smart quotes and punctuation — so users could stick to a strict Ascii markdown source, and let converters/previewers handle it.

But for some reasons GitHub's markdown html renderer/previewer doesn't use smartypants — and this has some impact on users expectations, because having to use Alt codes to get an em-dash (even here, as I type this comment) is quite tyring.

Anyhow, it seems that all markdown cleanup tools agree that ascii only characters (or html entities as a last restort) are the only sound approach — of course, the latter make a document quite unreadable.

A solution could be writing some script to filter the output of markdownfmt — and since markdownfmt enforce an Ascii standard on quotes, dashes, etc., it would quite easy. A single batch/shell script cold handle invoking markdownfmt and pipe it to this script before rewriting the source file.

I guess that would be the simple approach, and coul be created in any language.

Thanks again (and Season Greetings)

Tristano