filiph/linkcheck

avoid repeating invalid links

Opened this issue · 2 comments

Run the command linkcheck https://webdev.dartlang.org. Part of the output generated will be as shown below. Note that the two 404s are repeated 5 times. It would be nice to list the erroneous links only once.

https://webdev.dartlang.org/angular/guide
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)

To give more context on this design decision: the output is currently geared towards writers. Writers want to see where the broken links are, and want to see all of them (even if they're duplicated in terms of destination).

For dartlang.org, we're currently using linkcheck as 'webmasters'. We want to see what pages on the site are broken and we don't need a lot of insight on where those links are in the source pages. It's a different context and it needs a different way of sorting.

I plan a --webmaster mode (naming TBD) that does just that. I think the default mode should be writers as, in the usual scenario, only few links are broken, and mostly because they need updating on the source side, not the destination side.

This is just my thinking. I'm eager for input.

That makes sense. Given this new understanding, I'd say that #2 is of higher priority, since for the use case I am targeting, most of the repeated links are ones that I'd want to exclude.