Special unicode sequences (like \u0000) in JSON causes PostgreSQL ingestion failure
Closed this issue · 3 comments
Description:
If the JSON dataset to ingest contains \u0000 unicode char, the ingestion process fails with a PostgreSQL error. Encountered this when tried to upload data collected with SharpHound flag "collectallproperties" enabled.
Component(s) Affected:
- UI
- API
- Neo4j
- PostgreSQL
- Data Collector (SharpHound, AzureHound)
- Other (tooling, documentation, etc.)
Steps to Reproduce:
- Collect data with SharpHound "collectallproperties" (meaning collect all LDAP properties) enabled.
- Upload it in BloodHound CE (used the UI for uploading the JSON files).
- Ingestion goes to "fail" status.
- app-db-1 container throws PostgreSQL error in the docker logs
Expected Behavior:
It is expected to upload and ingest data successfully even if it contains special unicode chars.
Actual Behavior:
Currently the \u0000 unicode sequence implies an ingestion error.
Screenshots/Code Snippets/Sample Files:
Here is the error message in the logs:
app-db-1 | 2023-12-14 00:50:23.632 UTC [91] ERROR: unsupported Unicode escape sequence
app-db-1 | 2023-12-14 00:50:23.632 UTC [91] DETAIL: \u0000 cannot be converted to text.
app-db-1 | 2023-12-14 00:50:23.632 UTC [91] CONTEXT: JSON data, line 1: {"auditingpolicy":...
app-db-1 | 2023-12-14 00:50:23.632 UTC [91] STATEMENT: INSERT INTO "asset_group_collection_entries"
app-db-1 | ("asset_group_collection_id","object_id","node_label","properties","created_at","updated_at")
app-db-1 | (SELECT * FROM unnest($1::bigint[], $2::text[], $3::text[], $4::jsonb[], $5::timestamp[], $5::timestamp[]));
The referenced JSON data (from *_domains.json): ... "auditingpolicy":"\u0000\u0001" ...
Environment Information:
BloodHound:
- bloodhound-ce-bloodhound-1: specterops/bloodhound:latest (~v5.3.1)
- bloodhound-ce-app-db-1: postgres:13.2
- bloodhound-ce-graph-db-1: neo4j:4.4
Collector:
- SharpHound v2.3.0
OS:
- ArchLinux for BloodHound
- Windows 10 (22H2) for SharpHound
Database (if persistence related): Neo4j 4.4 / PostgreSQL 13.2
Docker (if using Docker):
- Docker 24.0.7
- Docker-Compose 2.23.3
Contributor Checklist:
- I have searched the issue tracker to ensure this bug hasn't been reported before or is not already being addressed.
- I have provided clear steps to reproduce the issue.
- I have included relevant environment information details.
- I have attached necessary supporting documents.
- I have checked that any JSON files I am attempting to upload to BloodHound are valid.
I've run into this same issue under similar circumstances using SharpHound with --collectallproperties
.
Relevant app-db log:
UTC [99] DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ...010�]j˟�3�PP.�"],"msmqsigncertificates
Apologies, this was resolved in SpecterOps/SharpHoundCommon#141. This was handled on the SharpHound side, so please be sure to update SharpHound.