Use the AST-based API to annotate broken links
Shreyas-21 opened this issue · 0 comments
Currently, there is a method annotate_in_str
for all the formats which takes in a string and parses it to extract the links. However, when we want to replace the broken links annotated with their status back in the source (for eg: [link](http://www.google.com/does-not-exist)
to [link](http://www.google.com/does-not-exist - [404 Not Found])
, it is done directly in the source string using regular expression matching.
Ideally, it could be done by replacing it in the AST, and transforming the AST back to a string, but this poses some challenges:
- Not all parsers may support converting their AST back to a string
- Even if they do, they may discard whitespace characters which do not change the semantics. For example, in a markdown file
# Heading 1
# Heading 2
is the same as
# Heading 1
# Heading 2
for most practical purposes. But one of the use cases for us it to create a PR with the broken links annotated, and the parser discarding characters will generate a noisy diff
with whitespace changes.
Ideally, every parser we use should give us some location information so that we could write a to_string
preserving the original structure, but this is currently not the case.