qri-io/jsonschema

Implementation is not thread safe

Arqu opened this issue ยท 10 comments

Arqu commented

The current implementation is not thread safe (mostly due to the global schema registry) and should be improved.

Known reports: #76 (comment)

Do you know what type of fix you would accept as a PR? I had to workaround this, and ended up making the keyword registry a struct so that after it's constructed, you aren't modifying a package variable anymore. The solution does use locks though in order to preserve the original API.

I can submit a PR if that solution seems okay to you.

bmfs commented

The schema registry might suffer from the same problem.

Would it be viable to make both schemaRegistry and keywordRegistry scoped to the jsonschema.Schema instance?

Was looking into @carolynvs fix, and seems to be a step in the right direction, although the keywordRegistry still is globally scoped.

Yeah sorry my fix wasn't everything that is necessary to address the problem. It was just enough to fix the code paths that we use for our project.

bmfs commented

@Arqu Is @carolynvs implementation what you were thinking as a solution to this ticket?

I'm in need of a solution to this, and willing to work on it.
My plan is to continue @carolynvs work and extend it to include the schema registry ( and to the loader registry I've added in the #101 ).

Any concerns or suggestions that help on such PR being accepted are more than welcome.

Arqu commented

I'm not too sure of all the details we will have to deal with on this front, however the shortest path there that I see would be to wrap the initialization of a schema (rs := &jsonschema.Schema{} from the readme) into a higher level instance and initialize with a function call for a default initialization and a call for user supplied registries.
In either case the parent, let's call it SchemaInstance, would be the holder of those registries and all calls bellow would source their structs from there. This will probably require a bit of refactoring work to get there fully, but am happy to assist on it and see it through.

Thank you for looking into this!

I would like to highlight that some other data structures in the jsonschema package are also not thread safe: we have recently encountered an issue with jsonschema.RegisterValidator method (for context: we use cnab-go stack and this method is called when unmarshaling cnab-go/bundle/definition.Schema, and it breaks during concurrent bundle loading).
A small example exposing concurrent map writes:

import (
	"github.com/qri-io/jsonschema"
	"sync"
	"testing"
)

func BenchmarkRegisterValidator(b *testing.B) {
	var wg sync.WaitGroup
	wg.Add(100)
	for i := 0; i < 100; i++ {
		go func() {
			jsonschema.RegisterValidator("name", jsonschema.NewType)
			wg.Done()
		}()
	}
	wg.Wait()
}

A higher-level cnab-go example triggering the same issue:

import (
	"github.com/cnabio/cnab-go/bundle/loader"
	"sync"
	"testing"
)

func BenchmarkLoadData(b *testing.B) {
	var wg sync.WaitGroup
	wg.Add(100)
	for i := 0; i < 100; i++ {
		go func() {
			loader.NewLoader().LoadData([]byte(`{"definitions":{"string":{"type":"string"}}}`))
			wg.Done()
		}()
	}
	wg.Wait()
}

@dataoleg Porter is using a fork of this library with a fix for that. Here is the open PR to cnab-go and the fork.

cnab-go was reluctant to use my patch because it won't be accepted upstream in this repo. If there is another person needing the patch besides Porter, maybe that is enough support to get it merged into cnab-go until a better fix is available.

I have submitted a PR with the patch we have been using in cnab and porter. #103 I'm not sure this is what you were looking for, but maybe it can get us thinking about a solution.

#103 has been merged. It doesn't completely fix all thread-safety concerns but it does make calls to Validate and RegisterKeyword safe.

Unfortunately Validate is not completly thread-safe yet. I found at least one more problem calling ValidateBytes.

==================
WARNING: DATA RACE
Read at 0x00c0005903d8 by goroutine 79:
  github.com/qri-io/jsonschema.(*Schema).Register()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:60 +0x5c
  github.com/qri-io/jsonschema.(*Schema).ValidateKeyword()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:280 +0xa8
  github.com/qri-io/jsonschema.(*Schema).Validate()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:260 +0x4ec
  github.com/qri-io/jsonschema.(*Schema).ValidateBytes()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:330 +0x17c
 
 
 
Previous write at 0x00c0005903d8 by goroutine 78:
  github.com/qri-io/jsonschema.(*Schema).Register()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:63 +0x74
  github.com/qri-io/jsonschema.(*Schema).ValidateKeyword()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:280 +0xa8
  github.com/qri-io/jsonschema.(*Schema).Validate()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:260 +0x4ec
  github.com/qri-io/jsonschema.(*Schema).ValidateBytes()
      /Users/xx/go/pkg/mod/github.com/qri-io/jsonschema@v0.2.1/schema.go:330 +0x17c