Allow handing in pre-parsed AST

Question

Allow handing in pre-parsed AST

pke opened this issue 4 years ago · 13 comments

For my component I'd like to extract all the text links from the markdown to build an action sheet out of the links.

So I would like to pre-parse the markdown using markdown-it for all links and then hand in that AST to the component instead of a markdown-it prop.

I want to prevent thet parsing of the markdown twice, once in my component and second in yours.

Do you think that would be a possible addition?

Answer 1 · 2020-06-12T08:37:07.000Z

So we could do two things I think

Make a prop that accepts the output of markdownit.parse - if this exists we just use this instead of running it internally
Make a prop that is the AST, which would replace the output of AstRenderer.js if present

The problem with 2 is, AstRenderer.js is a very changed file - so there is potential for lots of breaking in the future.

What do you think?

Answer 2 · 2020-06-12T09:12:10.000Z

The output of markdownit.parse is almost what I am using now, but I prefer a less nested AST (preferably flat even) so I am using your exported tokensToAST.

I am currently using this code and it works great but has a performance hit due to double parsing:

const links = useMemo(() => {
  const markdown = MarkdownIt({
    html: false,
    typographer: false,
  })
  const ast = tokensToAST(stringToTokens(children, markdown))
  const collectLinks = (result: FinePrintLink[], node: ASTNode): FinePrintLink[] => {
    if (node.type === "link") {
      result.push({
        uri: node.attributes.href,
        text: node.children?.map(({ content }) => content).join(" "),
      })
    }
    return node.children.reduce(collectLinks, result)
  }
  return ast.reduce(collectLinks, [])
}, [children])

I would not introduce a new prop, but rather define children additionally as union with ASTNode[] or even the return value of markdownit.parse

So my call to Markdown could look like this:

const ast = tokensToAST(stringToTokens(children, markdown))
<Markdown onLinkPress={onPress} style={style}>{ast}</Markdown>

Could also be possible

const tokens = stringToTokens(children, markdown)
// same as
const tokens = MarkdownIt().parse(children)
<Markdown onLinkPress={onPress} style={style}>{tokens}</Markdown>

What do you think?

Answer 3 · 2020-07-04T09:40:00.000Z

Hey

Sorry about the delay here - can you take a look at https://github.com/iamacup/react-native-markdown-display/releases/tag/6.2.x and let me know if this is what you were thinking?

Supports this syntax:

const markdownItInstance = MarkdownIt({typographer: true});

const copy = `
# H1

Paragraph
`;

const ast = tokensToAST(stringToTokens(copy, markdownItInstance))

            <Markdown
              debugPrintTree
            >
              {ast}
            </Markdown>

Answer 4 · 2020-07-04T18:54:41.000Z

released this in 6.1.5 - re-open if we need to change how it works :) Readme is updated to reflect the above

Answer 5 · 2020-07-04T20:05:07.000Z

Thanks, I'll test it on Monday and report back here!

Answer 6 · 2020-07-05T21:02:52.000Z

@iamacup works like a charm! Thanks for fulfilling this feature request!

Answer 7 · 2020-07-20T05:23:23.000Z

Two problems in 6.1.6:

Passing in AST nodes is slower than plain markdown string, it's obvious in FlatList scenario. (even the AST nodes is not modified).
Image node can't be parsed in AST nodes way.

p.s.
Currently I'm using React.memo for FlatList item cache, so problem 1 is not that serious, but it's really slow.

Answer 8 · 2020-07-20T07:55:21.000Z

@helsonxiao thanks for reporting back.

to 1st: Could you provide a source snippet how you use the AST within the FlatList? The markdown component also uses just memoization internally so maybe your AST is re-calculated over and over again because of unstable dependencies?

to 2nd: You mean images are not displayed that way? That shouldn't make any difference. Maybe a test case could be made for that.

Answer 9 · 2020-07-20T13:07:27.000Z

just opening to keep track of this in case it turns out to be library related - thanks for handling @pke !!

Answer 10 · 2020-07-22T06:57:30.000Z

@helsonxiao thanks for reporting back.

to 1st: Could you provide a source snippet how you use the AST within the FlatList? The markdown component also uses just memoization internally so maybe your AST is re-calculated over and over again because of unstable dependencies?

to 2nd: You mean images are not displayed that way? That shouldn't make any difference. Maybe a test case could be made for that.

I haven't measured it with Performance API but I do feel the difference. I also tried to cache the AST processor or return AST nodes directly. Note: My FlatList Item use react-native-syntax-highlighter for code display, but it's still slower than plain string mode after I removed the highlighter. If it's still confused for you, I will check your codes. Anyway, React.memo is necessary so it's not a big deal.
Try this input. It works well in plain string mode.

const input = "![背包四讲.png](https://media-test.jiuzhang.com/media/markdown/images/7/16/72644746-c710-11ea-8f89-0242ac1c0003.jpg)\n\n```\n[[java]]\nasfafagagag233\n[[golang]]\nasfgsagag\n```\n\n![背包四讲.png](https://media-test.jiuzhang.com/media/markdown/images/7/16/72644746-c710-11ea-8f89-0242ac1c0003.jpg)"

// turn on astPreprocessor then images disappear
return (
  <Markdown /* astPreprocessor={astPreprocessor} */ rules={rules}>
    {input}
  </Markdown>
);

// here I just return the ast, nothing changed.
// I will use this method to pick up important nodes.
const astPreprocessor = (ast: ASTNode[]) => {
  return ast;
};

const Markdown: React.FC<MarkdownProps> = props => {
  const _props = {
    rules,
    style: MarkdownStyles,
    ...props,
  };

  if (props.astPreprocessor) {
    const ast = tokensToAST(stringToTokens(props.children, markdownItInstance));

    return <_Markdown {..._props}>{props.astPreprocessor(ast)}</_Markdown>;
  }

  return <_Markdown {..._props}>{props.children}</_Markdown>;
};

Answer 11 · 2020-07-22T07:42:08.000Z

About the speed difference:
At which point are you using memoizsation? The result coming back from astPreprocessor should be memoized in your component based on the children.

const {
  children,
  astPreprocessor,
} = props
const content= useMemo(() => {
  astPreprocessor ? astPreprocessor(tokenToAST(stringToTokens(children, markdownItInstance)) : children
}, [children, astPreprocessor])

return <_Markdown markdownIt={markdownItInstance} {...props}>{content}</_Markdown>

Make sure you are also re-using the markdownItInstance you globally defined, or the Markdown component will create its own.

About the images I'd have to check this out in a free minute, but without knowing your styles for images and what the preprocessor actually does with the AST, its hard to predict.
So your preprocessor divides the markdown based on code fences?

Answer 12 · 2020-07-24T02:46:12.000Z

I think the markdown input is not entirely settled. stringToTokens only does the job partly. There are still two more steps from this file. https://github.com/iamacup/react-native-markdown-display/blob/master/src/lib/parser.js#L19

I copied them into my project then it works fine 😄. It could be nicer if you export them or refine the stringToTokens function. The speed confusion is also resolved. Thanks!

Answer 13 · 2020-07-31T10:19:46.000Z

+1 to cleaning up the necessary imports for a custom parser function. At present both cleanupTokens and groupTextTokens need to be imported from the src/lib/util folder.

However I wouldn't be in favour of consolidating the current functions into stringToTokens, as for my use case I need to do some preprocessing before groupTextTokens is called:

import {
  MarkdownIt,
  stringToTokens,
  tokensToAST,
} from 'react-native-markdown-display';
import { cleanupTokens } from 'react-native-markdown-display/src/lib/util/cleanupTokens';
import groupTextTokens from 'react-native-markdown-display/src/lib/util/groupTextTokens';

const markdownIt = MarkdownIt({
  typographer: true,
});

export default function toAST(markdownContent) {
  // Process into an AST, to support the changing of links into blocklinks.
  let tokens = stringToTokens(markdownContent, markdownIt);
  tokens = cleanupTokens(tokens);
  // It's important to blockify the links before grouping the text tokens,
  // as this will exclude the links from being grouped, which would otherwise cause issues on Android.
  tokens = blockifyLinks(tokens);
  tokens = groupTextTokens(tokens);

  return tokensToAST(tokens);
}

// We need links to be treated as 'blocklinks' so that styling works correctly.
function blockifyLinks(tokens) {
  tokens.forEach((token) => {
    if (token.type === 'link') {
      token.block = true;
    }
  });
  return tokens;
}