espresso3389/pdfrx

Is text selection possible on PDF?

Closed this issue Β· 42 comments

I'm interested in using the [pdfrx] for my Android application. I would like to know if it supports text selection functionality. Specifically, I would like to implement a feature where users can long-press on a PDF document and select/highlight text.

I've explored the documentation and codebase, but I couldn't find any specific information regarding text selection. Could you please clarify if this feature is supported or provide any guidance on how to achieve it?

Thank you for your assistance!

I'm still working on that but on GitHub, you can see the "initial" and "draft" design of the underlying text extraction API and it's usage example (not yet finished);

Basically, underlying API is almost freezed (not to be changed) but I should integrate text selection GUI code into PdfViewer widget anyway (I'm just working and I will take 1 week to finish I guess).

Task lists so far:

  • On Desktop, if I use SelectionArea widget for text selection purpose, it blocks pan-to-scroll
  • Search popup (how to customize or such)
  • Hyperlink handling

Anyway, I'm still open to any opnions and suggestions so please feel free to ask anything to me!

Oh, you can see the current progress on the demo site: https://espresso3389.github.io/pdfrx/

0.2.1 is a new release that stabilizes the API;

  • example code has possible demonstration code to select texts

0.2.1 is a new release that stabilizes the API;

  • example code has possible demonstration code to select texts

Inaccurate Cursor Position and Selector Shaking Issue during PDF Zoom on Android.

I have noticed an issue while zooming in on PDF documents on Android. The cursor position for text selection seems to be inaccurate, and there is also a shaking phenomenon observed with the selector. This affects the precision and usability of selecting text within the PDF viewer. It would be greatly appreciated if this issue could be addressed and resolved. Thank you!

29991702524846_ pic

I understood that. I'm still working on it.

Badly waiting for this feature

Still work in progress; but I'm struggling with Flutter's text selection mechanism...

Video.mp4

Still work in progress; but I'm struggling with Flutter's text selection mechanism...

Video.mp4

It looks great! I’m looking forward to this feature.

It's bleedingly experimental and still not almost usable on Desktop platforms (though it's working), but you can see the current behavior with the git's latest version (38bbd64):

PdfViewer.uri(
              Uri.parse(kIsWeb
                  ? 'assets/hello.pdf'
                  : 'https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf'),
              controller: controller,
              displayParams: PdfViewerParams(
                maxScale: 8,
                // FIXME: if it's desktop, text selection feature is not correctly working now.
                // Even on mobile platforms, it is still very experimental. Please take extreme care when using it.
                enableTextSelection: !_isDesktop,
              ),
            ),
...

Current issues:

  • It's slow; rendering of selection area is currently intentionally delayed from the actual scroll/zoom
  • Desktop version would not allow you to pan if your cursor is on SelectionArea widget (currently, the widget is filled with light-cyan color)

The reason of such issues is very simple, current Flutter API does not assume such kind of selection mechanism

  • If we directly use SelectionArea/Selectable on Interactive Viewer, the selection handles and popups are zoom-in/-out by InteractiveViewer
  • SelectionArea requires us to build many Selectable widgets to make the texts selectable

I'm still investigating the way to improve the performance but anyway I should post some report on Flutter issues.

And, the demo video here:

Video.mp4

Thank you very much for your efforts on this feature. I still have a question. How to use the mouse wheel on the desktop to scroll PDF instead of zooming?

How to use the mouse wheel on the desktop to scroll PDF instead of zooming?

Good question!
Frankly speaking, I don't know. InteractiveViewer assigns mouse wheel to zooming :(
Just reading InteractiveViewer's document does not answer to the question.

#114280 seems related to the question, but reading the thread does not explain me what to do.

Any ideas?

For scroll by mouse-wheel issue, see #10; this issue is for text selection only.

Is there a way to optimize text selection in a PDF with a dual-column layout?

@mlican

Is there a way to optimize text selection in a PDF with a dual-column layout?

It's not a easy way to correctly handle PDF's embedded text.

In theory, the reading order has to be implied by the embedded text content, but in reality, PDF generator software embeds texts based on their own algorithm.

Frankly speaking, every PDF reader software is struggling with such almost "broken" text structure and there is no correct way to restore the original author's intention. Some uses AI or such to restore them.

So, anyway, I don't intend to build perfect text extraction mechanism but some acceptable text search and extraction.

As a side note, for pdfium, I'm now using a determinisitc algorithm to restore text structure that fits well with Flutter's SelectionArea.

Flutter's SelectionArea is just another beast that I should battle with. It's basically designed to select normal hand-crafted Text widgets but PDF's embedded texts are neither so simple nor sophisticated and Flutter's SelectionArea would not accept them without normalizing the embedded texts.

@espresso3389 Thanks for the work. It helps a lot. Why has this feature been removed from the latest code though?

@yhyh0 Yes, it is removed (at 0.4.3 ab9c9d0) because it's too slow and could not be used in realistic apps.
I'm now rewriting new text selection code that does not depend on Flutter's Selectable.

The last code is on pdf_widget.dart on 70a92e and it may hopefully work with the latest pdfrx...

@yhyh0 Yes, it is removed (at 0.4.3 ab9c9d0) because it's too slow and could not be used in realistic apps. I'm now rewriting new text selection code that does not depend on Flutter's Selectable.

The last code is on pdf_widget.dart on 70a92e and it may hopefully work with the latest pdfrx...

I feel that the text selection performance made based on Flutter's Selectable is pretty good. The following is a display of my application.

741fc43b659322641076f4d1c1df7576.mp4

@yhyh0 Yes, it is removed (at 0.4.3 ab9c9d0) because it's too slow and could not be used in realistic apps. I'm now rewriting new text selection code that does not depend on Flutter's Selectable.

The last code is on pdf_widget.dart on 70a92e and it may hopefully work with the latest pdfrx...

Wow you are fast! πŸ‘ 70a92e doesn't work with the latest, but it's fine I will wait for the new release.

Could there also be a callback, something like onTextSelected with the text and the event/rect(for showing a customized context menu)?

@mlican
Could you explain me how did you realize that? Is it based on my implementation?

@mlican
Could you explain me how did you realize that? Is it based on my implementation?

Yes, I made some optimizations on the text selector you wrote. I believe the performance is excellent, and the experience is fantastic. I'll provide you with the code for your reference later. I'm currently on my way home.

@mlican Could you explain me how did you realize that? Is it based on my implementation?

Separately create PDF text selection widget

class PdfText extends StatefulWidget {
  const PdfText({
    required this.page,
    required this.pageRect,
    super.key,
  });

  final PdfPage page;
  final Rect pageRect;

  @override
  State<PdfText> createState() => PdfPageTextState();
}

class PdfPageTextState extends State<PdfText> {
  PdfPageText? pageText;
  List<Paragraph> paragraphList = [];

  Future initData() async {
    pageText = await widget.page.loadText();
    paragraphList = kMeansText(
        pageText?.fragments ?? []); // fragments convert to List<Paragraph>
    if (mounted) {
      setState(() {});
    }
  }

  @override
  void initState() {
    initData();
    // TODO: implement initState
    super.initState();
  }

  @override
  void didUpdateWidget(covariant PdfText oldWidget) {
    if (widget.page != oldWidget.page) {
      pageText?.fragments.clear();
      initData();
    }
    // TODO: implement didUpdateWidget
    super.didUpdateWidget(oldWidget);
  }

  @override
  Widget build(BuildContext context) {
    if (pageText == null) {
      return Container();
    }

    return _generateSelectionArea(
        context, pageText!.fragments, widget.page, widget.pageRect);
  }
  ......
}

MouseRegion in _generateTextSelectionWidgets needs to set hitTestBehavior

MouseRegion(
            hitTestBehavior: HitTestBehavior.translucent, // Avoid mouse wheel anomalies
            cursor: SystemMouseCursors.text,
            child: _PdfTextWidget(
              registrar,
              fragment,
              fragment.charRects?.map((e) {
                return e
                    .toRect(height: page.height, scale: scale)
                    .translate(-rect.left, -rect.top);
              }).toList(),
              rect.size,
            ),
          )

31a76ab incorporates the @mlican implementation of PdfText as PdfPageTextOverlay.

But anyway, I'm working on another issue and will release new version soon this weekend I hope.

Yet another implementation (706da7c) that runs far faster than the previous versions:

Video.mp4

The problem still remains is that when some refresh event occures such as zooming changes, the selection is cleared completely. I should do a little more work for that.

0.4.7 has incorporated the text selection code though it still have several issues as noted on CHANGELOG.md.

You can test it by PdfViewerParams.enableTextSelection.

The character positioning in this PDF document is incorrect, and the text selection is abnormal. I'm not sure what the reason is.
2326fb2d7c227a32f7f40446320f6de4.pdf

0.4.17 adds more consistency updates to text selection.
It makes text selection on mobile devices better.

Is it correct that the current selection always starts at the beginning of a line? (see the attached recording from https://espresso3389.github.io/pdfrx/). The selection is blazingly fast by the way!

Would it be possible to support double click (select word) and triple click (select line). Obviously, I can create a separate issue if you prefer

simplescreenrecorder-2024-02-04_19.15.46.mp4

@MarcVanDaele90 I don't check the web's selection issue deeply, but it's just a web version (pdf.js) issue and other platforms correctly select words separately though they still have issues on selection...

0.3.48 has minor code changes on text selection.

@MarcVanDaele90 I don't check the web's selection issue deeply, but it's just a web version (pdf.js) issue and other platforms correctly select words separately though they still have issues on selection...

When I check the pdfjs example here https://mozilla.github.io/pdf.js/web/viewer.html , it seems to work fine so I'm not sure that this is just a pdfjs issue. Can you have another look at this?

@MarcVanDaele90

When I check the pdfjs example here https://mozilla.github.io/pdf.js/web/viewer.html , it seems to work fine so I'm not sure that this is just a pdfjs issue. Can you have another look at this?

Yes, I've been checking it. But I'm not sure what causes the issue. Same algorithm on cross-platform side.
Only the difference is web specific implementation but it just loads all the text fragments on the memory...

Is it possible to create a small stripped down example that illustrates the issue? Then I can also take a look and see if I can help.

@espresso3389 Thank you for coming so far with PDF support and the possibility to select text.

I wanted to point you to an issue, for which I'm not sure you are aware of (could be what you mean with "The problem still remains is that when some refresh event occures"): When I select text and what to right click if (in order to open the context menu, e.g. for copying), the selection is being reset and only the current line is selected. This can be reproduced with your demo.

Would be awesome if you find a solution here :-)

Yes, I knew that. I've been struggling with the Flutter's behavior. Currently I don't understand the Flutter's selection architecture...

ε·₯δ½œδ»εœ¨θΏ›θ‘ŒδΈ­οΌ›δ½†ζˆ‘ζ­£εœ¨εŠͺεŠ›θ§£ε†³ Flutter ηš„ζ–‡ζœ¬ι€‰ζ‹©ζœΊεˆΆ......

视钑.mp4

Whether you can release the callback of the selected text is this function in flutter : contextMenuBuilder

@MRYIN123 It's a little old question but I think PdfViewerParams.perPageSelectionAreaInjector can do it. See SelectionArea for more.

while selecting text it create blue like box and selects random text

while selecting text it create blue like box and selects random text

Is selection area is correct?
For the copied text, the content may be broken depending on the PDF.

The text selection feature itself is working though it is still experimental. And the issue is too large to break into. So anyway, I want to close the issue and further issues should be discussed on its dedicated issues.

Related: #180 On Flutter Web on mobile devices, unless enableTextSelection: false is set, panning/zooming by touch gesture never get enabled.