scalar-labs/scalar-jepsen

All I see is `status code:500 error message:can't get the public key from storage`

craigpastro opened this issue · 7 comments

I tried running a dl test as lein run test --test cas --ssh-private-key ~/.ssh/id_rsa in Azure and basically all that I see is

INFO [2019-10-10 05:40:12,665] jepsen worker 2 - jepsen.util 2	:invoke	:cas	[1 0]
INFO [2019-10-10 05:40:12,666] jepsen worker 3 - jepsen.util 3	:fail	:cas	[4 0]	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,666] jepsen worker 3 - jepsen.util 3	:invoke	:write	2
INFO [2019-10-10 05:40:12,669] jepsen worker 1 - jepsen.util 1	:fail	:write	1	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,669] jepsen worker 1 - jepsen.util 1	:invoke	:cas	[1 3]
INFO [2019-10-10 05:40:12,672] jepsen worker 4 - jepsen.util 4	:fail	:write	4	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,672] jepsen worker 4 - jepsen.util 4	:invoke	:cas	[3 0]
INFO [2019-10-10 05:40:12,673] jepsen worker 2 - jepsen.util 2	:fail	:cas	[1 0]	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,673] jepsen worker 2 - jepsen.util 2	:invoke	:cas	[3 2]
INFO [2019-10-10 05:40:12,674] jepsen worker 0 - jepsen.util 0	:fail	:write	2	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,674] jepsen worker 0 - jepsen.util 0	:invoke	:cas	[4 2]
INFO [2019-10-10 05:40:12,677] jepsen worker 3 - jepsen.util 3	:fail	:write	2	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,677] jepsen worker 3 - jepsen.util 3	:invoke	:write	0
INFO [2019-10-10 05:40:12,680] jepsen worker 4 - jepsen.util 4	:fail	:cas	[3 0]	status code:500 error message:can't get the public key from storage
INFO [2019-10-10 05:40:12,681] jepsen worker 4 - jepsen.util 4	:invoke	:write	1
...

then the test ends with a java.util.concurrent.ExecutionException: java.io.IOException: No such file or directory

@siyopao The log occurs when a certificate is not registered. Do you see registerCertificate failure ?
We changed the port for registerCertificate so that might be related.

I can't tell. The logs go:

2019-10-10 05:40:02,932{GMT}	INFO	[jepsen worker 3] scalardl.cas: register a certificate and contracts
2019-10-10 05:40:03,593{GMT}	INFO	[jepsen worker 0] jepsen.core: Running worker 0
2019-10-10 05:40:03,594{GMT}	INFO	[jepsen nemesis] jepsen.core: Running nemesis
2019-10-10 05:40:03,595{GMT}	INFO	[jepsen worker 1] jepsen.core: Running worker 1
...

and the tests continue.

OK, hmm, maybe it's better to remove the DL test for now. Can you do that ?

It is not running yet. I was just testing for https://github.com/scalar-labs/scalar/pull/277.

Oh, OK...

I guess that registerCertificate fails and the caller doesn't check the status of it.

@feeblefakie Sorry for the late response.
You are right. The caller doesn't check the status of the registration.

(.registerCertificate @client-service)
(doseq [c CONTRACTS]
(.registerContract @client-service (:name c) (:class c) (:path c) (Optional/empty))))))

I will check the port for registerCertificate and add status checks for registrations for a certificate and contracts.

The default port is used and it works well when registration doesn't fail.
The status check has been added in PR #17 .
A test stops immediately if registration fails.