Flet/github-slugger

Breaking changes in 1.1

revin opened this issue · 5 comments

Hello! Greenkeeper just notified me that github-slugger 1.1 broke npm/marky-markdown; the issue is that in 1.0, unicode emoji characters in headings were being stripped out, but now they're being converted to HTML entities.

For example, given ## 😄-😄 unicode hyphen unicode:

  • 1.0 rendered --unicode-hyphen-unicode
  • 1.1 now renders 😄-😄-unicode-hyphen-unicode

...which has broken a handful of our tests. Is there a way to get the old behavior? Thanks! 👍

Thanks @revin!

The introduced changes were trying to fix a bug in github-slugger, which can be seen is this heading as slugged by GitHub: https://github.com/wooorm/gh-and-npm-slug-generation#Привет-non-latin-你好, unfortunately, it seems to be too loose.

Now I’m wondering what exact characters are allowed by GH in slugs; white space and punctuation, and emoji? Thoughts?

Flet commented

Apologies @revin! If we need to revert and/or redo this I'm fine with it!

GitHub says they use vmg/redcarpet (which is mostly C code) to do the markdown rendering. So the answers should be there, or at least that's the first place I plan on looking when I get a chance.

There’s also github/markup, but I couldn’t find any reference to the jargon word “slug” in those repos.

I’ve included the input/output from wooorm/gh-and-npm-slug-generation and chrisdickinson/emoji-slug-example in the tests locally, and the only difference from the current algorithm seem to be the emoji. I’ll look into integrating mathiasbynens/emoji-regex, and if that fixes things.

That might do it. I was reading through https://github.com/vmg/redcarpet/blob/master/ext/redcarpet/html.c#L273-L322 and now have a bunch of tabs open to see what the unicode support situation is on C's standard library functions. 😕