vaeth/bookmarkdupes

*please* let it show partial search results!

Closed this issue · 22 comments

I have a butt-ton of bookmarks. Several times I've switched browsers, importing bookmarks and things end up duplicated. When bookmarkdupes searches for exact dupes, it takes forever (multiple days I suspect) - I have never seen it display the list of duplicate bookmarks!
(I don't know if it's starting a new search when I click 'exact dupes' again...)

Please add a button to stop the search and show the currently-found results! Then I can delete a lot of the duplicates and search again.

I have a similar issue. I switched from Chrome to Mozilla and now have many duplicates and many empty folders that can’t be removed with this plugin or manually. I’m wondering if there is some file permanently hosed.

I’ve tried deleting from multiple places, in safe mode, etc. I tried deleting the places.sqlite file. Firefox hates me.

Maybe worth a try - export the bookmarks, create another profile, and import them. Sounds like something's buggered.

Wanted to try that. It acts like it is exporting, but no file is written.

vaeth commented

When bookmarkdupes searches for exact dupes, it takes forever (multiple days I suspect)

Unless there is an issue with not enough memory or another strange problem, this should not happen.
The algorithm to find duplicates uses linear time in average (assuming that firefox's java implementation needs in average constant time for hashing in a "Map" object), i.e. if you have the double number of bookmarks, it should only take the double number of time.

How large exactly is the number of bookmarks you have? How large is your memory? (It might help if you close other memory hungry processes.)

How long does it take if you press the button "All bookmarks"?

Here are my test numbers:
For 50000 bookmarks (a copy of 25000 non-duplicate) the button "All bookmarks" needs about 11 seconds while "Exact Dupes" needs about 15 seconds. However, I have 8 Gigabyte of memory which is only used for the firefox process. Lack of memory is a problem which can hardly be solved by the code.

Please add a button to stop the search and show the currently-found results!

I had experimented with that, but had removed that code again, because it only slowed down.
It turned out that the main time is lost until browser.bookmarks.getTree() finishes (which is browser-internal) and that the actual checking of the obtained tree for duplicates is not much slower than recursing through the full tree. Only this recursing through the tree could be made stoppable, but since javascript is not really multitasking capable, it was necessary for the implementation of the stop button to use a promise in each recursion step. This mechanism alone took longer than parsing through the full tree (and BTW costed a lot of additional memory).

vaeth commented

The problem with changing checkboxes is that I do not want that the state is lost when you close the popup. Unfortunately, this means that the state (i.e. all checkboxes) must be sent to the background process. Firefox is apparently very slow in transferring such huge messages.
Maybe I should spend some time to optimize the sending. Unfortunately, this is very error-prune.

The problem with changing checkboxes is that I do not want that the state is lost when you close the popup. Unfortunately, this means that the state (i.e. all checkboxes) must be sent to the background process. Firefox is apparently very slow in transferring such huge messages.
Maybe I should spend some time to optimize the sending. Unfortunately, this is very error-prune.

What happens if you replace popup to a new window or tab (theoretically)?
Firefox will also be slow because of this?

vaeth commented

What happens if you replace popup to a new window or tab (theoretically)?

It's the same problem if you want that the window's state is restored after reopening.
(Of course, for a window/tab this is not so much necessary as for a popup.)
In any case, using a window would be a rather different program which would need additional permissions.

I think the problem with sending less data is much easier to solve; it is actually not really hard, but one has to be very careful with races. It is mainly a question of finding some time for coding. In the moment, there are things in my life which are much more important.

vaeth commented

The current version in the master branch now opens in a tab instead of a panel.

In the moment, this means essentially just an optical change; the functionality is the same, i.e., if you close the tab and re-open it, the old content will be re-displayed (including your markings).
To save memory and time for transferring, this functionality would have to change.

There is in the moment just one functional/speed improvement: The bookmarks are now tabs, and you can click and switch to them without having to re-open a panel (which took time if you had many bookmarks).

vaeth commented

The background process is now eliminated, so things should be much quicker now and need less memory.

This is almost a complete rewrite, so lots of new bugs might have arisen.

Careful testing is needed before releasing ;)

vaeth commented

Thanks for the testing.

Concerning the delay: I hoped that counting all boxes after every checkbox change is quick enough.
I recoded it now by hashing the marked boxes: Faster, but higher memory requirements.

vaeth commented

I would like to release 4.0 as quickly as possible when it does not contain a serious bug anymore:

sometimes I'll miss the checkbox so it doesn't get selected

This sounds serious.

Do you mean: You press "mark all but youngest" (or which button precisely?) and only a subset of the bookmarks will be marked? This would mean a serious bug in the algorithm for that button.

Or do you mean: You press a checkbox and it doesn't get marked?

If this happens, I think it is a firefox issue. The only other possibility is that you pressed some of the buttons (like "mark all but youngest") and tried to check the box before that operation was finished.
Yes, I am aware that boxes should be disabled until the operation is finished, but I am hesitating to do this, because disabling/re-enabling thousands of checkboxes might take a ridiculous amount of time.

But that's with my absurd number of duplicates...

This shouldn't have any influence concerning the code of the extension: All that is happening in the extension is that a bookmark is removed/added to the hash table, and the number is updated. These actions are independent of the number of displayed bookmarks.

Of course, the updating of the number might cause some delay on firefox if the displayed page is huge.

I thought about adding a checkbox to make the display of selected numbers optional, but I am not yet decided: This might be rather confusing to the users.

vaeth commented

In the master branch both is implemented:

  1. There is a checkbox to disable counting
  2. Bookmark checkboxes are disabled during updating of checkboxes

I hope that using setTimeout() will force the disabling to happen before the marking algorithm gets executed: I found no official documentation that setTimeout() will force this, although it seems to work.

vaeth commented

Since bookmarkdupes-4.0 is now released, I am closing this bug.
Feel free to open another one if e.g. you have further suggestions.