sergioramos/remark-prism

tsx language not highlighting correctly

Closed this issue · 4 comments

Hi there.

I'm attempting to highlight a tsx language block. prismjs supports tsx and it highlights the code correctly when I try it here: https://prismjs.com/test.html#language=tsx

When I use the remark-prism package, it outputs a broken highlight.

This is the markdown i'm parsing:

```tsx
export type LinkProps = React.DetailedHTMLProps &
  NextLinkProps;

export type foo = 'bar';
```

This is what it should be generating (correct):

<code class=" language-tsx"><span class="token keyword">export</span> <span class="token keyword">type</span> <span class="token class-name">LinkProps</span> <span class="token operator">=</span> React<span class="token punctuation">.</span>DetailedHTMLProps<span class="token operator">&lt;</span>HTMLAnchorElement<span class="token operator">&gt;</span> <span class="token operator">&amp;</span>
  NextLinkProps<span class="token punctuation">;</span>

<span class="token keyword">export</span> <span class="token keyword">type</span> <span class="token class-name">foo</span> <span class="token operator">=</span> <span class="token string">'bar'</span><span class="token punctuation">;</span></code>

This is what it generates (broken):

<code class="language-tsx"><span class="token keyword">export</span> <span class="token keyword">type</span> <span class="token class-name">LinkProps</span> <span class="token operator">=</span> <span class="token maybe-class-name">React</span><span class="token punctuation">.</span><span class="token property-access"><span class="token maybe-class-name">DetailedHTMLProps</span></span><span class="token tag"><span class="token tag"><span class="token punctuation">&#x3C;</span><span class="token class-name">HTMLAnchorElement</span></span><span class="token punctuation">></span></span><span class="token plain-text"> &#x26;</span>
<span class="token plain-text">  NextLinkProps;</span>
<span class="token plain-text"></span>
<span class="token plain-text">export type foo = 'bar';</span>
</code>

I'm using this code to test:

const fs = require('fs');
const remark = require('remark');
const html = require('remark-html');
const prism = require('remark-prism');

const markdownToHtml = async (
  markdown
) => {
  const result = await remark()
    .use(prism)
    .use(html)
    .process(markdown);
  return result.toString();
};

const markdown = fs.readFileSync('markdown.md', 'utf8');

markdownToHtml(markdown).then(html => {
  console.log(html);
})

Theories as to why this is happening

I think it's got something to do with character encoding.

Notice the difference of these characters:

  • &amp; vs &#x26;
  • &lt; vs &#x3C;

Also note that remark-prism is not encoding where it should be. Eg it's not encoding > to &gt;.

Perhaps the encoding of these characters is throwing prism off?

Off-topic

I wasn't able to run the tests on my local machine (MacOS Catalina)

thank you for this, will take a look 👍

Thanks! I've discovered that rehype-prism produces the same broken output, so perhaps it's an issue with remark?

This is the code (TypeScript) I'm using for rehype-prism:

import { VFileCompatible } from 'vfile';
import unified from 'unified';
import parse from 'remark-parse';
import remark2rehype from 'remark-rehype';
import html from 'rehype-stringify';
import rehypePrism from '@mapbox/rehype-prism';

export const markdownToHtml = async (
  markdown: VFileCompatible
): Promise<string> => {
  const result = await unified()
    .use(parse)
    .use(remark2rehype)
    .use(rehypePrism)
    .use(html)
    .process(markdown);
  return result.toString();
};

hey @sergioramos, this doesn't seems to be an issue anymore! Something was fixed, but it's not clear which package fixed this.

With the example markdown and code above, i'm able to get the expected results (correct):

<div class="remark-highlight"><pre class="language-tsx"><code class="language-tsx"><span class="token keyword">export</span> <span class="token keyword">type</span> <span class="token class-name">LinkProps</span> <span class="token operator">=</span> <span class="token maybe-class-name">React</span><span class="token punctuation">.</span><span class="token property-access"><span class="token maybe-class-name">DetailedHTMLProps</span></span> <span class="token operator">&#x26;</span>
  <span class="token maybe-class-name">NextLinkProps</span><span class="token punctuation">;</span>

<span class="token keyword">export</span> <span class="token keyword">type</span> <span class="token class-name">foo</span> <span class="token operator">=</span> <span class="token string">'bar'</span><span class="token punctuation">;</span>
</code></pre></div>

Closing 🎉

Actually I think this was an issue with prism itself.

See: PrismJS/prism#2594 released with https://github.com/PrismJS/prism/releases/tag/v1.23.0