Generated index.html contains HTML errors
Closed this issue · 4 comments
The following index.html (generated for JesterJ project, head today) has several errors in the html. One thing I would like to do in my build is automate checking that the links in the report are valid (non 404/400/etc) to detect things like junrar's invalid link to it's own license (junrar/junrar#95). Valid markup would make this easier, particularly xhtml would be best.
index.h_t_m_l.txt
Warning: This document appears to be written in English. Consider adding lang="en" (or variant) to the html start tag.
From line 1, column 1; to line 2, column 6
↩<html>↩<head
For further guidance, consult [Declaring the overall language of a page](https://www.w3.org/International/techniques/authoring-html.en?open=language&open=textprocessing#textprocessing) and [Choosing language tags](https://www.w3.org/International/techniques/authoring-html.en?open=language&open=langvalues#langvalues).
If the HTML checker has misidentified the language of this document, please [file an issue report](https://github.com/validator/validator/issues/new) or [send e-mail to report the problem](mailto:www-validator@w3.org).
Error: The character encoding was not declared. Proceeding using windows-1252.
Error: Start tag seen without seeing a doctype first. Expected <!DOCTYPE html>.
From line 1, column 1; to line 2, column 6
↩<html>↩<head
Error: Start tag head seen but an element of the same type was already open.
From line 7, column 1; to line 7, column 6
</title>↩<head>↩<body
Error: No p element in scope but a p end tag seen.
From line 598, column 9; to line 598, column 12
>↩ </p>↩
Hi @nsoft,
If you're trying to automate HTML parsing you're probably barking at the wrong tree. The plugin has plenty of renderers to pick from, JSON and XML ones will be much easier to parse.
Also, if you're looking only to validate and fix the links consider implementing your own filter instead. This will greatly simplify the whole thing, making the entire parsing step unnecessary.
Sure, but I see no good reason for you to produce broken html?
<head>
<title>
Dependency License Report for ingest
</title>
<head> <----- should be close tag
In my case I was thinking of publishing the html into my uber-jar because it is a reasonably convenient format for a human tryin to understand what's going to be running. As such I'm interested in verifying what ever makes it into the html I intend to publish. I was hoping to throw something ike https://github.com/rackerlabs/gradle-linkchecker-plugin at it... Still learning your tool, and next I'll want to eliminate references to licenses I am not choosing to use (i.e. if it's GPL and MPL, and I'm intending to use under MPL, I want the GPL reference removed) so I'll look if something with filtering helps there. The link you provide seems to be the same as the list of renderers however.
The link you provide seems to be the same as the list of renderers however.
My bad, here're the proper links
https://github.com/jk1/Gradle-License-Report#filters
https://github.com/jk1/Gradle-License-Report#writing-custom-renderers-importers-and-filters
Sure, but I see no good reason for you to produce broken html
Me neither, will fix that