[Web] Text selection on web always starts at start of line

Question

[Web] Text selection on web always starts at start of line

Opened this issue 3 months ago · 5 comments

As mentioned in Issue #4 , text selection on Web has one remaining issue: it always selects complete lines.
This can be reproduced when trying to select a couple of words on the demo application https://espresso3389.github.io/pdfrx/

Answer 1 · 2024-09-19T09:25:19.000Z

Some more observations (which might be obvious to you).

I noticed the following when opening the same (two-page) pdf

on Linux, PdfPageTextPdfium._loadText(...) created 581/292 fragments
on Web, PdfPageTextWeb._loadText(...)created only 72/43 fragments

When printing out the text of the resulting PdfPageTextFragment, I noticed that Pdfium seems to add fragments on word level while Web seems to add fragments per line.

This explains why a selection always starts at the beginning of the line I guess.
Not sure whether you can get also word-fragments on web somehow?

Answer 2 · 2024-09-19T11:03:06.000Z

You're right. I don't know how to extract word level coodinates with pdf.js. pdf.js example viewer can handle word level coodinates but it uses something provided by HTML canvas or such. I need more research on that...

Answer 3 · 2024-11-16T02:29:19.000Z

Any updates on the text selection feature for the web? It seems there is also an issue with consistency when selecting text. For example, sometimes it misses certain words or skips some parts

Answer 4 · 2024-11-19T16:37:22.000Z

I've just googled the things and found the issue.

It explains the dedicated part to extract text positions is;

I'll read the codes to know how pdf.js handles text coordinates.

Answer 5 · 2024-11-20T06:33:55.000Z

This is great news! Thanks for the heads up!