Breaking changes in 1.1
revin opened this issue · 5 comments
Hello! Greenkeeper just notified me that github-slugger 1.1 broke npm/marky-markdown; the issue is that in 1.0, unicode emoji characters in headings were being stripped out, but now they're being converted to HTML entities.
For example, given ## 😄-😄 unicode hyphen unicode:
- 1.0 rendered
--unicode-hyphen-unicode - 1.1 now renders
😄-😄-unicode-hyphen-unicode
...which has broken a handful of our tests. Is there a way to get the old behavior? Thanks! 👍
Thanks @revin!
The introduced changes were trying to fix a bug in github-slugger, which can be seen is this heading as slugged by GitHub: https://github.com/wooorm/gh-and-npm-slug-generation#Привет-non-latin-你好, unfortunately, it seems to be too loose.
Now I’m wondering what exact characters are allowed by GH in slugs; white space and punctuation, and emoji? Thoughts?
GitHub says they use vmg/redcarpet (which is mostly C code) to do the markdown rendering. So the answers should be there, or at least that's the first place I plan on looking when I get a chance.
There’s also github/markup, but I couldn’t find any reference to the jargon word “slug” in those repos.
I’ve included the input/output from wooorm/gh-and-npm-slug-generation and chrisdickinson/emoji-slug-example in the tests locally, and the only difference from the current algorithm seem to be the emoji. I’ll look into integrating mathiasbynens/emoji-regex, and if that fixes things.
That might do it. I was reading through https://github.com/vmg/redcarpet/blob/master/ext/redcarpet/html.c#L273-L322 and now have a bunch of tabs open to see what the unicode support situation is on C's standard library functions. 😕