2.1.0 HTML is missing many hyperlinks
rillig opened this issue · 4 comments
The SARIF 2.1.0 specification is available in HTML format, among others.
The abbreviation HTML means hypertext markup language. It is therefore strange that the HTML document contains way fewer hyperlinks than the Microsoft Word document or the PDF document. Especially when the body text contains a cross-reference to another section, the HTML document only contains the text § 3.19
, without being linked to that section.
What is the rationale for producing the HTML version of the standard with most hyperlinks missing?
The HTML format is required per process. HTML is the most accessible format. To the best of my knowledge, the target audience for the specification consists of people mostly reading and not jumping around?
I am wondering if there is some common need behind these tickets?
In this case, maybe an initial collecting ticket would be a better start to interact with the technical committee members.
To the best of my knowledge, the target audience for the specification consists of people mostly reading and not jumping around?
I am part of a team that implements the SARIF specification. I personally haven't read through the whole specification, in most cases it seemed enough to read a single section or subsection, as the cross-references in the specification mention the details that are necessary for understanding the text. For example, I never stumbled upon a rectangle
object, so I didn't need to read that section until now.
I doubt that most people read the specification in a linear way, as that requires a lot of concentration. Maybe during the first phase, to learn all the topic that the specification covers. But after this phase, I don't see a point to read the specification linearly anymore. As a reader, I most often have a specific question, so I go directly to the relevant section and continue from there, following the cross-references as needed.
My concrete use case is: In the last week, I updated our product documentation, which involved cross-checking that the references from our documentation to the specification are correct, and that our implementation documentation agrees with the specification. During that task, I jumped directly to the physicalLocation
section. From there, my eyes moved down to the region
property section, and there I wanted to continue reading with the region
object section (§3.30), since that property was the most interesting property to me. _As you may have noticed in the previous paragraph, I omitted the link to the region
object section, to show you how natural it feels to have a hyperlink at that point. I experienced several similar situations during the last week. If it had been only a single instance, I wouldn't have invested the time to write this issue.
The more general issue I have with specifications from OASIS is that generating the specification in HTML format seems to be part of the standard process, as you said, but this standard process apparently doesn't require that the generated HTML contains cross-references even within the same document. On the other hand, the Microsoft Word format and the PDF format contain these cross-references. The missing links in the HTML format make me wonder whether OASIS understands what the word hypertext in Hypertext Markup Language means. To the rescue of OASIS, they didn't standardize HTML.
Since OASIS is also the publisher of the OpenDocument format, it should be quite easy to find an expert on these document formats who can automate converting a .docx file into an OpenDocument format (which should preserve the in-document hyperlinks) and from there generate an HTML file that includes these hyperlinks. I opened the .docx file in Microsoft Word, saved it in HTML format and was surprised that Microsoft Word didn't generate hyperlinks for the in-document cross-references. So while it is Microsoft's fault that Microsoft Word doesn't generate proper HTML, I see it as OASIS' fault to embed Microsoft Word in the process for publishing standard documents.
I understand that you hyper jumped to a place already.
I guess, no one is expecting that everyone reads the whole spec from start to finnish every time.
Maybe we can keep the scope of the issue to what it is.
I would even remove the word "many" as it does not seem to bring value.
So, some links are in one format and you ask as one user why these links are not in another format.
You also answer that question by stating that indeed Microsoft word does not export cross refs that are no explicit hyperlinks but only the textual representation.
The three formats are offered to maximize accessibility.
As does the fact that the user has access for free.
So, maybe the needs of this one user are best served by the PDF version, while another user like me is best served by searching jumping and reading in the HTML version?
Many committees with members coming from different places and being used to different authoring tools contribute to the body of work at OASIS open. Offering authoring formats including markdown, html, and xml until now seemed to be more important than optimizing the technical publication process.
OASIS offers guidance for TCs to avoid cross-refs in some proprietary tools because of exactly these long known portability issues of some authoring tools, but not every standard starts on a green field, some start as a submitted full spec (assumed to be done) which is than further edited by the receiving committee.
OASIS is open to participation - everyone can join and contribute - individuals, members of organizations, and organizations on different levels.
Now that we are switching from Word to markdown, perhaps it will be easier to make this consistent.