pcrov/JsonReader

Search file for ID and return full value of matching objects

Closed this issue · 4 comments

Hi there and first of all, thank you for this amazing parser. A true life saver.

I'm currently trying to fully understand how it works, but have run into a problem that I can't really figure out how to solve.

I have a large Json file that looks like this (part of it):

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2568,
    "name": "Andy",
    "parentCategory": 2552,
    "url": "andy",
    "dateUpd": "2016-06-23 13:55:13",
    "dateAdd": "2016-05-13 11:29:32",
    "urlImages": [
      "http://imageurl.com/2568_header.jpg",
      "http://imageurl.com/2568_menu.jpg",
      "http://imageurl.com/2568_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

What I'm trying to do is search through this file after all instances where "parentCategory" equals 2570 and then print/echo the whole object that this ID is part of.

So far, this is what I've got:

$reader = new JsonReader();
$reader->json($json);

while($reader->read("parentCategory")) {
    $parentID = $reader->value();
    if ($parentID == 2570) {
      echo $reader->value()."\n";
    }
}
$reader->close();

This prints the parentCategory ID, but what I need is to be able to use the parentCategory name and value to identify the whole object it belongs to and in the end return the following:

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

Is this achievable with your parser?

Thank you so much for any help you can give me!

pcrov commented

JsonReader works in a forward-only manner, so if you might need prior data you'll need to hang onto it until that determination can be made.

The easiest way to do this would be to step into the array, grab each object in full, check the parentCategory and ignore any that don't match. E.g.:

$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth();

do {
    $object = $reader->value();
    if ($object["parentCategory"] === "2570") {
        var_dump($object);
    }
} while ($reader->next() && $reader->depth() === $depth);

Note that because numbers get returned as strings (this will likely become optional in a future version) and you didn't get the opportunity to inspect their type, you'll lose their type information this way.

If you need to retain that and you know ahead of time what should be a number it's easy to fix them up:

$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth();

do {
    $object = $reader->value();
    if ($object["parentCategory"] === "2570") {
        $object["id"] = +$object["id"];
        $object["parentCategory"] = +$object["parentCategory"];
        var_dump($object);
    }
} while ($reader->next() && $reader->depth() === $depth);

(The unary + will cast to int or float as appropriate automagically.)

Let me know if this doesn't work out for whatever reason. There is always another way to do things, it just might be a bit more cumbersome.

Wow, that's exactly what I was looking for! And yes, I will probably know before what will be numbers so I should be able to fix things up :)

Thank you so much for your help and for your great work, such a versatile tool!

Hi again @pcrov ,
wanted to follow up on this and ask you about the following:

I'm importing a rather large json file and I'm trying to stream it from the API server using fopen. I'm having a bit of a problem making it efficient though, it feels like the parsing takes a really long time.

Right now my code looks like this:

$fp = fopen($filename, 'rw'); // create file

$reader = new JsonReader();
$reader->stream($fp);
$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth(); //Check depth
$object_array = array(); //Set up empty array
do {
    $object = $reader->value(); //Store object before check
    if ($object["category"] === $category_id) { //Do the check
        $object_array[] = $object; //Store object in array
    }
unset($object); // free memory?
} while ($reader->next() && $reader->depth() === $depth);
$json_object = json_encode($object_array, JSON_PRETTY_PRINT); //Convert array to nice Json
echo $json_object; //Output Json
$reader->close();
fclose($fp);
unlink($filename); // delete file

As you can see, I've added the line:

unset($object);

in an attempt to free memory, but not sure if it has any effect. Does this look like a good solution to you?

Thanks!

// Jens.

pcrov commented

If you haven't already done so upgrade to the latest release, 0.7.0, as it's significantly faster than prior versions. There are still more speed improvements in the works, but nothing quite like the jump 0.7.0 made.

Make sure xdebug isn't loaded at all. Even when not enabled the extension has a massive performance impact.

Parsing a stream from a remote API directly while supported won't be as quick as parsing a local file, though from the code you've posted it looks like you're dealing with a local file already.

I wouldn't expect unset to do much useful there as $object is being immediately overwritten on the next iteration anyway. Besides, worrying about memory consumption when your problem is speed only makes sense if you're hitting swap (or garbage collection issues, but that shouldn't be a problem here), and you're going to be bound on the memory front by the growing $object_array.

At the end of the day parsing massive files in PHP can only be so fast, and the low memory consumption you get from a streaming parser will always come at the expense of speed. It's the kind of thing best suited to running in a background task and checking the result later.