danfickle/openhtmltopdf

No support for generating accessible PDFs (PDF/UA)

philipfennell opened this issue · 9 comments

Our customer has specific requirements regarding the support for accessibility features defined in PDF/UA. When using Adobe Acrobat to test PDFs generated with your on-line sandbox we see that the accessibility report shows no support for tagged PDF which then results in issues with logical reading order and reflow. There is also no setting of document title or primary language.

See #30 for how to access PDFBox PDDocument to add meta-data and other properties directly.

We are also running into this due to 508 compliance testing. Are there any plans at this time to add 508 compliant metadata?

Issue:
The test is failing because there are no PDF Tags (Container Elements).
There may be other issues but that was the big one they handed down to us.

Standard PDF Tags:

  • Document - Document element. The root element of a document’s tag tree.
  • Part - Part element. A large division of a document; may group smaller units of content together, such as division elements, article elements, or section elements.
  • Div - Division element. A generic block-level element or group of block-level elements.
  • Art - Article element. A self-contained body of text considered to be a single narrative.
  • Sect - Section element. A general container element type, comparable to Division (DIV) in HTML, which is usually a component of a part element or an article element.

I've started work on PDF/UA support in PR #315

Hi Dan,

I love your work and interested in the PDF/UA compliance feature. I must produce 508 compliant pdf for a government client and can help guide/evaluate your implementation as well as contribute code.

I'm not sure how to become a contributor etc so please contact me if you're interested.

Thanks,
-Paul

Dan,
This is excellent.
Please let me know if there is anything I can do to help.

Thanks @pdsway and @ScrappyTheDev.

At the moment, the most useful thing would be real world test cases. You can either paste html in the issue (in a code block) or submit a PR. be71e17 shows how to add a PDF/UA test case.

Obviously, code review or suggestions are also welcome.

Thanks again for your interest in this feature and project.

OK, PDF/UA support is now available in the main branch, although not released to maven central yet. You can find the PDF/UA docs on the wiki.

Thanks everyone.

I'm especially indebted to the work of @chris271 for his open-source PDF/UA work that this implementation was based on. Thanks a lot!

Thanks Dan, I will review and get back to you soon.

Our group has a list of PDF 508 requirements (which are less than full PDF/UA). I will try to include them here. They mostly use Acrobat Pro for the 508 compliance validation.

  • Meta tags: title, language, filename
  • Reading order same as object order
  • PDF container tags. This is like HTML

    etc but in the PDF so the assistive reader identifies major breaks in the document. Several levels of PDF tagging.

  • Tag "decorative" text/drawing as "artifact" so it's ignored.

I'll add more requirements as I get them.

Hi @pdsway,

I've attached the all-in-one example, that you may use to check if this implementation meets your requirements. I think so far, so good!

all-in-one.pdf