tarides/olinkcheck

Use the AST-based API to annotate broken links

Shreyas-21 opened this issue · 0 comments

Currently, there is a method annotate_in_str for all the formats which takes in a string and parses it to extract the links. However, when we want to replace the broken links annotated with their status back in the source (for eg: [link](http://www.google.com/does-not-exist) to [link](http://www.google.com/does-not-exist - [404 Not Found]), it is done directly in the source string using regular expression matching.
Ideally, it could be done by replacing it in the AST, and transforming the AST back to a string, but this poses some challenges:

  • Not all parsers may support converting their AST back to a string
  • Even if they do, they may discard whitespace characters which do not change the semantics. For example, in a markdown file
# Heading 1




# Heading 2

is the same as

# Heading 1
# Heading 2

for most practical purposes. But one of the use cases for us it to create a PR with the broken links annotated, and the parser discarding characters will generate a noisy diff with whitespace changes.

Ideally, every parser we use should give us some location information so that we could write a to_string preserving the original structure, but this is currently not the case.