pandastrike/jsck

check into themis

Closed this issue · 16 comments

@MUSCULA tried to do a PR using the validator library https://github.com/playlyfe/themis with our benchmarks, but ran into trouble.

## Benchmarks for Draft 4

Schema: 'Event - Valid document'.  A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: ................................
  Iterations: ................................................................


  JSCK: validations/millisecond
  median: 116.848    max: 135.217    min: 68.582 

  tv4: validations/millisecond
  median: 59.104    max: 67.56    min: 43.675 

  jayschema: validations/millisecond
  median: 0.857    max: 1.068    min: 0.713 

  is-my-json-valid: validations/millisecond
  median: 1208.26    max: 1488.372    min: 594.312 

  themis: validations/millisecond
  median: 488.2    max: 607.715    min: 365.193 

Relative speeds:
is-my-json-valid : 1.000
themis : 2.475
JSCK : 10.340
tv4 : 20.443
jayschema : 1409.692


Schema: 'Configuration'.  A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: ................................
  Iterations: ................................................................


  JSCK: validations/millisecond
  median: 57.612    max: 71.369    min: 38.924 

  tv4: validations/millisecond
  median: 20.741    max: 24.056    min: 15.737 

  jayschema: validations/millisecond
  median: 0.46    max: 0.481    min: 0.423 

  is-my-json-valid: validations/millisecond
  median: 606.635    max: 812.698    min: 179.902 

  themis: validations/millisecond
  median: 72.48    max: 86.341    min: 44.329 

Relative speeds:
is-my-json-valid : 1.000
themis : 8.370
JSCK : 10.530
tv4 : 29.249
jayschema : 1317.964


Schema: 'Transaction'.  
Sample size: 64
Validations per sample: 64

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: TypeError: Object #<Object> has no method '0#transaction'
  at Object.validators.0#/items (<anonymous>:72:43)
  at Object.validators.(anonymous function) [as 0] (<anonymous>:59:48)
  at eval (<anonymous>:3826:37)
  at /Users/allanebdrup/Dropbox/Muscula-jsck/node_modules/themis/src/themis.js:2631:22
  at module.exports.themis.validate (/Users/allanebdrup/Dropbox/Muscula-jsck/benchmarks/draft4/validators.coffee:57:8)
  at Benchmark._measure (/Users/allanebdrup/Dropbox/Muscula-jsck/benchmarks/runner.coffee:31:15)
  at Benchmark.module.exports.Benchmark.run (/Users/allanebdrup/Dropbox/Muscula-jsck

This is the code I used to add themis.

  "themis":
     setup: (schema) ->
       require("themis").validator(schema)
     validate: ({validator, schema, document}) ->
       validator(document, '0')
     error: (result) ->
       if result.valid == true
         false
       else
         result.errors

I used similar code, and encountered similar errors.

The errors seem to come from an expectation in themis that every schema will specify an id attribute at its top level. This is not required in the spec.

I also checked themis against the official JSON Schema Test Suite using a Node.js harness which Panda Strike created last week. themis had a hard time with these tests, because the vast majority of the schemas in these tests do not have id attributes.

In fact, of 88 tests, only one had an id attribute at the top level.

I'm not closing the ticket, but I am going to de-prioritize testing themis for now, because I have a busy week next week, and especially because of this id attribute situation. I don't know how to make themis pass the official JSON Schema tests, and it might not be possible.

If it doesn't pass the official tests, then the question is by how much. JSCK decided not to implement a few small pieces of the spec, and many JSON Schema validators have made similar decisions. If this id thing is the only quirk with themis, we should probably figure out a way to test it, presumably by adding an id attribute. But if there are a ton of other places where themis deviates from the spec, it might not be worth the effort.

So, temporary wontfix, at least not until Feb 9th or so.

+1 wontfix.

I had no problem getting themis to pass the official test suite. If you use

require("themis").validator(schema)(document, '0')

It works without id's just fine. It's the '0' that does the trick. (also in the example code I shared above)

For reference here is the result I get when running themis against the official testsuite:
https://github.com/Muscula/json-schema-benchmark/blob/master/reports/themis.md

But we can agree that is a pretty weird implementation themis has for this.

well, go ahead and post a gist.

Well there is one test failing for themis:

change resolution scope, changed scope ref invalid

`Expected result: false but validator returned: "Object # has no method '0'"``

This is probably what disqualifies themis from your benchmarks.

The latest version of themis 1.1.5 should pass the jsck benchmarks. I've created a pull request for the same #82

However I believe the benchmarks require some improvement. I got widely varying results sometimes while running them. I would suggest using benchmarkjs. It provide much more accurate statistics.

To illustrate my point here's a sample of 5 runs of the benchmark with jsck, imjv and themis

Run1

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 2.278
Themis : 3.750
JSCK : 11.732

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 6.494
Themis : 7.073
JSCK : 11.017

Schema: 'Transaction'.
Sample size: 64
Validations per sample: 64

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 2.652
Themis : 3.458
JSCK : 4.178

Run 2

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 2.126
Themis : 3.162
JSCK : 10.432

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 7.976
Themis[minimal] : 9.040
JSCK : 22.875

Schema: 'Transaction'.
Sample size: 64
Validations per sample: 64

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 3.281
Themis[minimal] : 4.964
JSCK : 6.159

Run 3

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 1.774
Themis[minimal] : 2.021
JSCK : 14.632

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 3.844
Themis : 4.944
JSCK : 7.367

Schema: 'Transaction'.
Sample size: 64
Validations per sample: 64

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 3.099
Themis[minimal] : 4.867
JSCK : 7.621

Run 4

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 2.985
Themis[minimal] : 4.255
JSCK : 19.160

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 8.446
JSCK : 10.161
Themis : 13.009

Schema: 'Transaction'.
Sample size: 64
Validations per sample: 64

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 3.611
Themis[minimal] : 4.344
JSCK : 7.837

Run 5

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 1.485
Themis : 2.029
JSCK : 8.936

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis[minimal] : 3.636
Themis : 4.441
JSCK : 6.794

Schema: 'Transaction'.
Sample size: 64
Validations per sample: 64

JSCK
Warming up: ................................
Iterations: ................................................................

Themis[minimal]
Warming up: ................................
Iterations: ................................................................

Themis
Warming up: ................................
Iterations: ................................................................

is-my-json-valid
Warming up: ................................
Iterations: ................................................................

Relative speeds:
is-my-json-valid : 1.000
Themis : 2.367
Themis[minimal] : 2.370
JSCK : 2.848

we already have an issue to use benchmark, pull requests welcome: #29

anyway, congrats. this looks nice and fast. although this result happened for me just now:

Themis : 8.699
JSCK : 9.476
Themis[minimal] : 10.532

obviously this is another good reason to look at making #29 happen! it could just be a fluke. but if it happens consistently and isn't just a fluke, it might be worth looking into, since I'm pretty sure minimal Themis is supposed to be faster than regular Themis.

a few more comments on your PR, #82.

also, did you read this comment?

#81 (comment)

I do want to be sure Themis actually validates everything. I asked @MUSCULA for a gist, because he said he got Themis running the JSON Schema official tests. We have a test harness which should be pretty easy to set up.

https://www.npmjs.com/package/json-schema-tests

The bug that was causing themis to not complete in your benchmarks:
change resolution scope, changed scope ref invalid

`Expected result: false but validator returned: "Object # has no method '0'"``

Was fixed.

When I instantiate themis with a schema I use a try catch, and fail all the tests with that schema if it throws. So basically like in #82, but with a try-catch.

Be sure to npm update, to get the latest version with the fix.

@MUSCULA the reason for the 0 weirdness is because the compiled validator that themis generates is not a validator for a single schema but possibly for multiple schemas which were all compiled together. Its sorta inspired from JSV validator. In this scenario we need some way to specify the id of the schema to validate against.
In most actual use cases your schemas would carry an id to identify it some way and then this weirdness wouldn't happen, but in other scenarios we just basically use integers to identify each schema.

However besides this, themis is not supposed to be failing any tests. If you find any bug please do let me know and I'll have it fixed right away.

@gilesbowkett yes I also got quite a few runs when themis minimal was actually slower than themis default. I even got a run where IMJV was 2 times slower than themis minimal lol. However the benchmarks I use for themis itself uses benchmarkjs and over there the performance is pretty consistent.

You can find it here

@atrniv those look nice and thorough! I might dig into that code to see how you're using benchmarkjs.

@gilesbowkett I actually borrowed it from the benchmarks by ZSchema 3 by @zaggino. The credit goes to him. The only thing lacking in it is more practical benchmarks. Right now the only useful benchmarks are the basicObject and the AdvancedObject benchmarks. I'll probably be adding in the benchmarks from jsck into it as well, since they help portray a more rounded perspective of how the validators would perform with actual real world schemas.

@atrniv thanks! I didn't look at your implementation in detail yet, in fact it could be a week before I can dig into this with any real seriousness. but, fwiw, our benchmarks use five different schemas – two for draft 3 and three for draft 4. combine that with better statistics from benchmark and we could have something pretty comprehensive.

@MUSCULA @atrniv so I'm still seeing errors all over the place.

I've tried it a few different ways:

  • validator(data)
  • validator(data, '#');
  • validator(data, '#/');
  • validator(data, '0');
  • validator(data, 0);

obviously I'm missing something about how to work with Themis. help me out here.

(to be clear, this is when running the JSON Schema test suite, and/or validating against a schema with no top-level id.)

ok, got this working. will now handle #82. may have to create new branch for it, though, commits to master make me OCD.