tomasnorre/crawler

[FEATURE] re-indexing of concrete entries

Opened this issue · 2 comments

Is your feature request related to a problem? Please describe

There are indexed entries and some are added and removed daily. The re-indexing process handles all, including those that remain untouched. If there are a lot of entries in total, this complete daily re-indexing process takes a long time

Describe the solution you would like

I would like to ask if it is possible to extend the re-indexing process with the function that only newly added and deleted entries are indexed in one indexing run.

From the slack channel discussion

I have an old TYPO3 7 instance and the customer need a daily index refresh. There a several Usergroups involved what do the process inflate.
Until know we let run a reindex job every night, but the entries (News article) growing huge and let the process run almost 24h.
There are 4 user groups and I setup a configuration for each constellation, e.g. news_group_1 .. news_group_2 ... news_group_1_group_2 and so on.
This will result in 16 configuration setups + one for the entries that does not belong to any group.
Then I setup 4 crawler queues which will run at night. After this I have ~20000 entries for the crawler run process.
The crawler run process start every 3 min. with "count in a run" = 100. This was the best setup to not get "process was cancelled"
The Extension has the setting "Make direct requests" true

Hi there, thank you for taking your time to create your first issue. Please give us a bit of time to review it.

Thanks for the suggestion. Could you try to add some of the information from the Slack conversation, think that would make it easier to understand when revisiting the issue later again?