mnunberg/jsonsl

Matching multiple jprs does not work?

Closed this issue · 3 comments

I am not sure if this is a bug/unsupported/or something I am doing wrong?

I am attempting to match/extract/search (not sure of terminology) multiple jpr's at once. It appears that this is supported since jsonsl_jpr_match_state_init takes multiple jprs.

void jsonsl_jpr_match_state_init(jsonsl_t jsn,
                                 jsonsl_jpr_t *jprs,
                                 size_t njprs);

with the caveat

 * Note that currently the first JPR is the quickest and comes
 * pre-allocated with the state structure. Further JPR objects
 * are chained.

in the example below (which is an adaptation of jpr_test.c, the only example of jpr use I could find), I am attempting to search for both "/foo/bar/^" and "/foo/bar/^/id" in one pass. I have also attempted to have the second jpr be just "/id" in case that is what is meant by "chaining".

However, based on the output of the program only the first jpr is being matched, and the later one is not. Both of these jpr's work if they are the only jpr being matched, they just don't appear to work together.

Output of program below: I have prefixed lines with *** that I am attempting to match, but currently are not being matched.

Got match result: 0 for {"foo": {"bar": [{"id": 10 },{"id": 20 }],"inner object": {"baz":"qux"}}}
Got key..foo
Got match result: 0 for {"bar": [{"id": 10 },{"id": 20 }],"inner object": {"baz":"qux"}}}
Got key..bar
Got match result: 0 for [{"id": 10 },{"id": 20 }],"inner object": {"baz":"qux"}}}
Got match result: 1 for {"id": 10 },{"id": 20 }],"inner object": {"baz":"qux"}}}
orig: /foo/bar/^ 
Got key..id
***Got match result: -1 for 10 },{"id": 20 }],"inner object": {"baz":"qux"}}}
Got match result: 1 for {"id": 20 }],"inner object": {"baz":"qux"}}}
orig: /foo/bar/^ 
Got key..id
***Got match result: -1 for 20 }],"inner object": {"baz":"qux"}}}
Got key..inner object
Got match result: -1 for {"baz":"qux"}}}
Got key..baz
Got match result: -1 for "qux"}}}

Program below:

#include "jsonsl/jsonsl.h"
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#define _JSTR(e) \
    "\"" #e "\""

const char SampleJSON[] =
        "{"
            _JSTR(foo) ": {"
                _JSTR(bar) ": ["
//                     _JSTR(element0) ","
//                     _JSTR(element1)
                        "{\"id\": 10 },"
                        "{\"id\": 20 }"
                    "],"
               _JSTR(inner object) ": {" \
                   _JSTR(baz) ":" _JSTR(qux)
               "}"
           "}"
        "}";

struct lexer_global_st {
    const char *hkey;
    size_t nhkey;
};

static void push_callback(jsonsl_t jsn,
                          jsonsl_action_t action,
                          struct jsonsl_state_st *state,
                          const jsonsl_char_t *at)
{
    jsonsl_jpr_t jpr;
    struct lexer_global_st *global = (struct lexer_global_st*)jsn->data;
    jsonsl_jpr_match_t matchres;
    jsonsl_jpr_t matchjpr;
    int i;
    if (state->type == JSONSL_T_HKEY) {
        return;
    }
    matchjpr = jsonsl_jpr_match_state(jsn, state,
                                      global->hkey,
                                      global->nhkey,
                                      &matchres);
    printf("Got match result: %d for %s\n", matchres, at);
    if (matchjpr != NULL) {
        printf("orig: %.*s \n", (int)matchjpr->norig, matchjpr->orig);
    }
}

static void pop_callback(jsonsl_t jsn,
                         jsonsl_action_t action,
                         struct jsonsl_state_st *state,
                         const jsonsl_char_t *at)
{
    struct lexer_global_st *global = (struct lexer_global_st*)jsn->data;
    if (state->type == JSONSL_T_HKEY) {
        global->hkey = at - (state->pos_cur - state->pos_begin);
        global->hkey++;
        global->nhkey = (state->pos_cur - state->pos_begin)-1;
        printf("Got key..");
        fwrite(global->hkey, 1,  global->nhkey, stdout);
        printf("\n");
    }
}

static int error_callback(jsonsl_t jsn,
                          jsonsl_error_t error,
                          struct jsonsl_state_st *state,
                          jsonsl_char_t *at)
{
    fprintf(stderr, "Got error %s at pos %lu. Remaining: %s\n",
            jsonsl_strerror(error), jsn->pos, at);
    abort();
    return 0;
}


static void lexjpr(void)
{
    struct lexer_global_st global;
    int i;
    global.hkey = "";
    global.nhkey = 0;
    jsonsl_t jsn;
    jsonsl_jpr_t jprs[2];
    jprs[0] = jsonsl_jpr_new("/foo/bar/^", NULL);
    jprs[1] = jsonsl_jpr_new("/foo/bar/^/id", NULL);
    //jprs[1] = jsonsl_jpr_new("/id", NULL);
    assert(jprs[0]);
    assert(jprs[1]);
    jsn = jsonsl_new(24);
    assert(jsn);
    jsonsl_jpr_match_state_init(jsn, jprs, 2);
    jsn->error_callback = error_callback;
    jsn->action_callback_POP = pop_callback;
    jsn->action_callback_PUSH = push_callback;
    jsonsl_enable_all_callbacks(jsn);
    jsn->data = &global;

    jsonsl_feed(jsn, SampleJSON, sizeof(SampleJSON)-1);

    jsonsl_jpr_match_state_cleanup(jsn);
    jsonsl_jpr_destroy(jprs[0]);
    jsonsl_jpr_destroy(jprs[1]);
    jsonsl_destroy(jsn);
}

int main(int argc, char ** argv)
{
    lexjpr();
    return 0;
}

I'll try out your example later on, but /foo/bar/^ will match first (depending on how it was specified in the list; which in this case is first) and thus match_state() will not do anything more for this particular node.

The basic idea behind passing "multiple" search patterns was to allow an efficient way to match multiple, different patterns, for example /foo/^/id and /bar/^/id. In your case you you have one pattern that is essentially a sub-pattern of another (/foo/bar/^ will always be true when /foo/bar/^/id is true). Thus, you can simply use a single pattern (/foo/bar/^/id), and do the following:

  • If the match is possible (JSONSL_MATCH_POSSIBLE), then inspect the state->level value. This will tell you the depth of the value; if it's 2 for example (I don't remember the exact offset; whether it's 0-based or 1-based), then you can infer that the path is /foo/bar and thus you have your first match.
  • If the result is JSONSL_MATCH_COMPLETE then you have a /foo/bar/^/id field.

Another example of the matching functionality is in the perl JSON::SL module (also on my GitHub: https://github.com/mnunberg/perl-JSON-SL/blob/master/SL.xs).

The basic idea is that you keep track of the key/index you want to match against; which your code seems to be doing.

The 'chaining' is rather misleading; the JPRs are not "Chained" in any way, but they are iterated in a for loop; obviously the first pattern gets evaluated first, etc.

Thanks for the fast reply - you can close this issue if you'd like.

This jpr functionality probably won't work for my use case (multiple dynamically configured extraction paths) without parsing the json multiple times.

I just thought I'd drop a comment on the very nice design of your library while I'm here - it has the best streaming support I've found with no copying nor contiguous memory required.

You can always use the simple jsonsl_jpr_match() function. The match_state() is a bit more fancy, and basically does away with some of the grunt work you'd be doing when handling multiple paths; namely ensuring you aren't parsing patterns which have already been eliminated in previous runs.

You've actually caught me in a very opportune moment, as I'm now dealing with this library again - mainly in performing matches - though in this case I only have a single pattern, but the match syntax is slightly different (i.e. foo.bar.baz rather than /foo/bar/baz); no wildcards,and no need for URI escaping, etc.

I'd probably recommend you look into the implementation of the JPR subsystem and implement/modify accordingly. The key here is ensuring how to cleanly keep track of match status within a tree (with the added caveat that the "Nodes" are not unique - only a single state pointer is used per level :) ).

And thanks for the kind words!