shurcooL/markdownfmt

Support for semantic linefeeds?

jlevy opened this issue · 8 comments

jlevy commented

Thanks for the useful tool. In addition to the obvious benefits of consistency, I think it has the potential to help reduce merge and diff friction in Github when many people edit Markdown.

Have you considered support for Semantic linefeeds? That is, if a flag is enabled, use heuristics to split based on punctuation (probably period comma, and a few less common ones, at least in English).

I know it's a slightly unusual practice, and frowned on in some places (like Wikipedia) but it offers some useful benefits in the context of Markdown in Git, so thought I'd mention it here.

See jlevy/the-art-of-command-line#167 for some discussion. Having a tool for this convention (and perhaps some tunable variations, to allow experimentation on what conventions are best) might be helpful for the heavily-edited documents (like "awesome lists", https://github.com/jlevy/the-art-of-command-line, etc. ) that are becoming increasingly common on Github.

Thanks for the suggestion. As the label implies, this is something I'm thinking about, but I currently don't have actionable plans that are worth executing.

There are some benefits to semantic linefeeds, but I also like the current model that doesn't insert newlines at all, and expects your text editor/viewer to render the text with word wrap on. That way, as you resize your text editor/viewer, all text reflows and there's no need to manually edit newline positions.

One extra factor is that this will require a flag, and I prefer to avoid having configuration.

I just wanted to reply and give you some more insight on my thoughts on this.

jlevy commented

Yes, definitely there are pros and cons to both approaches. In general I agree with flowing text for text documents, but when you have GitHub workflow on Markdown, you begin thinking of Markdown as more like source code (with clear semantics and clean merges of commits) rather than flowing the way it will when it's later formatted. Editors often give previews anyway.

Also get that you want to avoid myriad configuration. That said, this whole discussion is one of those perennial problems and it may take some experimentation to find good solutions. It took decades to have large numbers of developers implement the "gofmt"-style non-negotiable formatting idea. I think having config settings "discouraged but possible" is one way to allow experimental features (e.g. don't add lots of flags, and require a special config file or something like that).

Anyway, thanks for the response! I'll update if I find a better solution to this problem.

That said, this whole discussion is one of those perennial problems and it may take some experimentation to find good solutions.

I'm in full agreement there.

I do think that it's best for the person most interested in a certain experiment to run it themselves, to maximize the chance of it working out well. I'd be very glad to see you fork this project, for example, experiment this, and we can later decide to merge the efforts if it makes sense.

I still don't see a viable way of making semantic newlines work well in the context of markdown files. My main problem is that, when used, it makes editing text anywhere other than "end" more difficult. Imagine you delete or add some text to the first line, it becomes shorter/longer, and all following lines need to be reflowed. Perhaps markdownfmt itself can do that for you, but wouldn't that still cause each line to have a large diff, defeating the purpose?

Anyway, thanks for the response! I'll update if I find a better solution to this problem.

No problem, and please do. I'm also happy to keep discussing it here; the "thinking" label allows me not worry about wanting to close this issue asap. :)

Also, FWIW, here's what a diff when editing a single paragraph can look like. IMO, it can be quite readable, since individual words that are changed are highlighted too:

image

jlevy commented

Sure, thanks — to continue the discussion: I get that you can see the difference sub-line with coloring (and git --color-words supports this too). That's not the real pain point. Rather it is merge conflicts when many people edit one doc. If two people change the same line, the merge is then a conflict under standard merging rules. E.g., when paragraphs are lines 5 sentences long, then non-trivial merges are 5 times more likely than if the sentences were split. (If we all wrote code with 300-char widths and 5 statements to a line, we'd have the same problem there, too.)

In your example, it sounds like you're thinking of regular word wrapping, e.g. on a column width, and yes, there reflows break everything too. What I was suggesting was a variant on the semantic breaks, where you break on something "stable" like sentence-ending periods and comma phrases more than a certain length. The rules can even be a little complex, as long as it's something deterministic and doesn't make the source ugly. Then, say, modifying one word, would only have "local" effect on a (much shorter) line, so conflicts would be less common.

Yes, perhaps I'll experiment with it, too (but I'll have to find time to pick up enough Go I resist the temptation to redo it in Python 😉 ).

jlevy commented

For what it's worth, I've finally revisited this idea, and wrote a new plugin for Atom that handles this need. I think the semi-semantic wrapping approach is preferable to #36's fixed line-length wrapping. It's new and I'll be experimenting with it more, so any feedback (and bugs) welcome!

@jlevy
using git diff --word-diff may be a better solution

If you change a word in a paragraph and rewrap (hard wrap, gwap in vim in normal mode) that paragraph - with this option the command will show only that one word changing. (The default diff output would show several lines.)

As for hard-wrapping itself - it's probably better for reading raw text files in my opinion, although not sure yet I can explain why. HTML and markdown usually ignore newlines when rendered either way (except for pre/code blocks).

jlevy commented

Just an update: As a practical matter, we've been using this approach in flowmark for some time now in the process of publishing about a dozen books, and overall it's worked well.

git diff --word-diff is a nice idea for cases when you can't control the format. But note GitHub's UI and git merges don't work at word level—they operate at line level. So having content in a form that is readable, normalized, and merges cleanly has been helpful for revising complex docs and editorial workflows.