simd support?
Opened this issue · 6 comments
Yes,
I am actively looking into these things and wanted to publish a design document to get feedback on implementation. It is also somehow a roadmap to 1.0 and will contain at least the following areas:
- Keywords layout. As described in #212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses
enum_dispatch
). Though it is incomplete but unlocks a lot - for example, there will be no need forRwLock
in$ref
as it will be possible to evaluate them at the compilation phase. - Custom input types. It seems like the way to support the crates you mentioned + other external types (like Python ones). Not sure what would be the best way to do so :( my attempts to wrap
serde_json::Value
without sacrificing too much were not successful. - Real error iterator. Now there are tons of unnecessary allocations on each
validate
call + all theflat_map
calls are responsible for long compile times (according tollvm-lines
). I'd like to have some tree iterator that doesn't allocate intermediate vectors - not sure about the right way to suspend/resume such a process. Maybe a separate state machine transitions table would work for this. - Avoid extra costs of
SchemaNode
- it is not needed foris_valid
andvalidate
calls, but adds extra overhead.
I expect to have it in a few days and it is roughly my roadmap for this lib :) I'd appreciate if you could share your thoughts on this or share your use case for integrating the crates you mentioned
Sorry I didn't get back earlier, but thanks for your thorough response!
I don't know enough about your implementation to cogently comment on your points, but the details I can tease out indicates that there's a lot of headroom the library can exploit to squeeze more performance.
I'm looking forward to the design document!
What I can contribute are my use-cases.
Currently, I'm using jsonschema-rs to validate CSV files (and that's why I originally asked about #339 ), and after using rayon, the performance is already quite impressive.
But as the flamegraph shows, any incremental performance from jsonschema will further accelerate qsv's validate
cmd.
I plan to leverage the qsv validate command in another project - https://github.com/dathere/datapusher-plus to validate CSV files before they are uploaded to CKAN.
#212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.
That sounds really interesting! Is that repo publically available?
@manuschillerdev I added it as a separate crate here - #373 :) It is a prototype, but ref resolving is more or less ready
Btw, @jqnatividad thanks for sharing your use case! I hope that soon we all can benefit from faster validation! :)
the changes though are quite large and I’ll appreciate any help there :)
@Stranger6667 I'll start testing the jsonschema-csr prototype and will let you know my findings!
I need to update qsv's benchmarks soonish and I'll be sure to include the prototype in it when I do.
And once I grok the internals, you can be sure I'll try to help as best as I can.
@jqnatividad Thank you! The currently submitted version is not working yet, but I am slowly working on it :)