ministryofjustice/analytical-platform

๐Ÿž๐Ÿง‘โ€๐Ÿš€ Ingestion Scan: File names including spaces not handled correctly by S3 Event Notifications

Opened this issue ยท 1 comments

Describe the bug.

This bug was raised by @pricemg (thanks!).

Given two files.
file1.txt and file 2.txt

  • file1.txt - the scan etc performs as expected.
  • file 2.txt - is not picked up and scanned.

From Matthew:

Found out yesterday that s3 event notifications don't handle file names with spaces in and instead converts them to + 's which then I found was confusing my lambdas as they'd be trying to copy a file called e.g. my+file.txt based on the event notification being fed in, but the file in s3 is called my file.txt and so things we're getting stuck. Don't know if something you want/need to handle in your ap ingestion code
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html

Initial attempts to solve this were reverted. We reverted them (and deleted the release we tested with) because files with spaces in them e.g. file 2.txt was scanned but was marked as infected when it was not.

To Reproduce

As part of briefly looking at this I (Gary) realised that there is currently no documented manner of testing the handler.py function in analytical-platform-integestion-scan.

If I recall when this was initially created we tested by creating release candidates and testing the full process, part of solving this bug should be to create a self-contained method in the analytical-platform-integestion-scan repository to test the handler.py function and document this in the README.

Expected Behaviour

Files with spaces are handled correctly.

Additional context

No response

We ought to discuss if this service is something we want to take advantage of and if so create a ticket and work on this.