coolwanglu/pdf2htmlEX

How to get hidden element using --correct-text-visibility option?

Opened this issue · 0 comments

Hello.
I have a PDF: https://drive.google.com/open?id=1I4VU4bY2J2XWHryxRjN7eA_ZlI9uGIyT

After parsing with --correct-text-visibility, pdf2htmlEX add fc6 sc0 class in hidden elements:
<div class="t m0 xd he y37 ff2 fs3 fc2 sc0 ls0 ws0">
<span class="fc6 sc0">MEDICAL </span>
<span class="fc1">
<span class="fc6 sc0">PLANS</span>
<span class="fc5">
<span class="fc6 sc0"></span>
</span>
</span>
</div>
<div class="t m0 xd h1 y38 ff1 fs0 fc0 sc0 ls0 ws0">
<span class="fc6 sc0"> </span>
</div>
<div class="t m0 xd he y39 ff2 fs3 fc5 sc0 ls0 ws0">
<span class="fc6 sc0">MEDICAL</span>
<span class="fc0">
<span class="fc6 sc0"> </span>
<span class="fc1"><span class="fc6 sc0">PLANS</span></span>
<span class="fc6 sc0"> </span>
</span>
</div>

This tag will change depending on the document? If so, how we can retrieve the hidden items in order to remove them from the document.

Thank you.