Generated index.html contains HTML errors

Question

Generated index.html contains HTML errors

Closed this issue 2 years ago · 4 comments

The following index.html (generated for JesterJ project, head today) has several errors in the html. One thing I would like to do in my build is automate checking that the links in the report are valid (non 404/400/etc) to detect things like junrar's invalid link to it's own license (junrar/junrar#95). Valid markup would make this easier, particularly xhtml would be best.
index.h_t_m_l.txt

    Warning: This document appears to be written in English. Consider adding lang="en" (or variant) to the html start tag.

    From line 1, column 1; to line 2, column 6

    ↩<html>↩<head

    For further guidance, consult [Declaring the overall language of a page](https://www.w3.org/International/techniques/authoring-html.en?open=language&open=textprocessing#textprocessing) and [Choosing language tags](https://www.w3.org/International/techniques/authoring-html.en?open=language&open=langvalues#langvalues).

    If the HTML checker has misidentified the language of this document, please [file an issue report](https://github.com/validator/validator/issues/new) or [send e-mail to report the problem](mailto:www-validator@w3.org).

    Error: The character encoding was not declared. Proceeding using windows-1252.

    Error: Start tag seen without seeing a doctype first. Expected <!DOCTYPE html>.

    From line 1, column 1; to line 2, column 6

    ↩<html>↩<head

    Error: Start tag head seen but an element of the same type was already open.

    From line 7, column 1; to line 7, column 6

     </title>↩<head>↩<body

    Error: No p element in scope but a p end tag seen.

    From line 598, column 9; to line 598, column 12

    >↩        </p>↩

Answer 1 · 2022-07-08T16:44:54.000Z

Hi @nsoft,

If you're trying to automate HTML parsing you're probably barking at the wrong tree. The plugin has plenty of renderers to pick from, JSON and XML ones will be much easier to parse.

Also, if you're looking only to validate and fix the links consider implementing your own filter instead. This will greatly simplify the whole thing, making the entire parsing step unnecessary.

Answer 2 · 2022-07-11T15:18:47.000Z

Sure, but I see no good reason for you to produce broken html?

<head>
    <title>
        Dependency License Report for ingest
    </title>
<head>  <----- should be close tag

In my case I was thinking of publishing the html into my uber-jar because it is a reasonably convenient format for a human tryin to understand what's going to be running. As such I'm interested in verifying what ever makes it into the html I intend to publish. I was hoping to throw something ike https://github.com/rackerlabs/gradle-linkchecker-plugin at it... Still learning your tool, and next I'll want to eliminate references to licenses I am not choosing to use (i.e. if it's GPL and MPL, and I'm intending to use under MPL, I want the GPL reference removed) so I'll look if something with filtering helps there. The link you provide seems to be the same as the list of renderers however.

Answer 3 · 2022-07-11T15:25:05.000Z

Problem appears to be here: https://github.com/jk1/Gradle-License-Report/blob/master/src/main/groovy/com/github/jk1/license/render/InventoryHtmlReportRenderer.groovy#L155

Answer 4 · 2022-07-11T18:29:20.000Z

The link you provide seems to be the same as the list of renderers however.

My bad, here're the proper links

https://github.com/jk1/Gradle-License-Report#filters
https://github.com/jk1/Gradle-License-Report#writing-custom-renderers-importers-and-filters

Sure, but I see no good reason for you to produce broken html

Me neither, will fix that