tomasnorre/crawler

crawler:buildQueue fails with Error at offset 0 of 92 bytes

hacksch opened this issue · 5 comments

Bug Report

Current Behavior

i execute the following command to build a queue and crawl the found pages.
"typo3cms crawler:buildQueue --depth 3 --mode exec 1 default"
After list of the found urls i get

Processing

0/299 [>---------------------------] 0%
Error at offset 0 of 92 bytes

Expected behavior/output
Execution should not fail

Steps to reproduce

Prepare a configuration
execute "typo3cms crawler:buildQueue --depth 3 --mode exec 1 default"

Environment

  • Crawler version(s): 11.0.10
  • TYPO3 version(s): 11.5.38
  • PHP version(s): 8.3
  • Is your TYPO3 installation set up with Composer (Composer Mode): yes

Possible Solution
The error appiers in JsonCompatibilityConverter line 39.

The dataString for unserialize in my behavior is
{"url":"https://www.domain.de/en/home.html","procInstructions":[""],"procInstrParams":[]}

This is not valid for unserialize. When i comment the lines 39-49 the execution works.

Your dataString looks totally fine, can be json_decoded and is an array afterwards which will be returned before unserialize take place.

I managed to see a convert call with an empty $dataString. This is not an array and throws the warning in unserialize.

I'm not familiar with this whole process but maybe we should:

  1. fail fast when $dataString is empty
  2. do not use try catch on unserialize as it does not throw anything but emitted a warning (since PHP 8.3)

Steps to reproduce: Crawler log -> Log -> Reload List

Should I prepare one or two pull requests?

@ulrichmathes If you have a suggested fix in mind, I would be happy to review a PR.

Hello, i will check your fix on monday. I'm not sure if this will help in my case i described and had debugged.
In my case the string was not empty and i was a error which stopped the process an was not a warning

And what i remember now is that i tried to unserialize the string directly via unserialize() and could reproduce the error. Without any further code.
Maybe a check if the string contains serialized data could be a solution for the described problem, else the lines 39-49 will skipped.
What do you think?

Hi @hacksch,

Sorry for not getting back to you. What do you think of the approach in PR #1091?