nysenate/OpenLegislation

Formatting issues with PDF bill texts

Closed this issue · 4 comments

The PDF bill text provided by the Senate API has some formatting issues around differentiating new and old text.

For example, A 5699 has an explanation which states that matters in brackets are old text, and matters in italics or underscored are new. However, there are no underscored or italicized sections of the text.

Would it be possible to use Apache Pdfbox to properly format the bill texts? Thank you so much.

Any progress on this? The current bill PDFs are a bit hard to read because of the lack of proper formatting.

Hi @gisrael

Thanks for the feedback. I agree, there is room for improvement in our current bill text PDFs.
This is not a change we can make right now; however, we will soon be processing bill text in a richer format which should allow us to make some improvements.

When looking at our current PDFs, new text is capitalized instead of italicized.

Thanks so much for getting back to me @KevinCaseiras. Do you have a timeline for when the new text format will become available?

Bill PDF's have been available in a new format for a while now. You'll notice green underlined text is added text, and red strike-through text has been removed.