EU-EDPS/website-evidence-collector

JSON output format incompatible with GCP Firestore

msokolov-roche opened this issue · 1 comments

Hi,

I would like to store the JSON output of the website-evidence-collector into the Google Cloud Firestore. However, there are some incompatibilities with the output JSON format.
It works when I try to store the output for https://google.com, however, when I try to store the output for https://roche.com I get an exception from Google Cloud Python API:

google.api_core.exceptions.InvalidArgument
400 Cannot convert an array value in an array value.
Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/flask/app.py", line 2073, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python3.9/site-packages/flask/app.py", line 1518, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python3.9/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python3.9/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/app.py", line 61, in json_hec_route store_result(target, result) File "/app.py", line 71, in store_result send_to_firestore(data) File "/app.py", line 99, in send_to_firestore db.collection(u'website-evidence-collector').document(domain.group(1)).set(data) File "/usr/lib/python3.9/site-packages/google/cloud/firestore_v1/document.py", line 167, in set write_results = batch.commit(**kwargs) File "/usr/lib/python3.9/site-packages/google/cloud/firestore_v1/batch.py", line 59, in commit commit_response = self._client._firestore_api.commit( File "/usr/lib/python3.9/site-packages/google/cloud/firestore_v1/services/firestore/client.py", line 1125, in commit response = rpc( File "/usr/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__ return wrapped_func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func return retry_target( File "/usr/lib/python3.9/site-packages/google/api_core/retry.py", line 191, in retry_target return target() File "/usr/lib/python3.9/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout return func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable raise exceptions.from_grpc_error(exc) from exc google.api_core.exceptions.InvalidArgument: 400 Cannot convert an array value in an array value.

It doesn't look like something that can be changed in the configuration of Firestore - See the stackoverflow answer to the same exception.
Therefore, I was wondering if you could suggest some fixes or workarounds. Thank you!
The JSON output for https;//google.com and https;//roche.com is too long to put into the issue, but I use the latest version of website-evidence-collector so you should be able to get the same output.

Hi!

I have run website-evidence-collector http://google.com and copy'n'pasted the inspection.json file to this validator: https://jsonlint.com

According to this website the output is valid JSON.

I have no access to Google Cloud Firestore. Have you tried to post the file with e.g. curl to your API endpoint to exclude that Python changes the JSON?