๐๐งโ๐ Ingestion Scan: File names including spaces not handled correctly by S3 Event Notifications
Opened this issue ยท 1 comments
Describe the bug.
This bug was raised by @pricemg (thanks!).
Given two files.
file1.txt
and file 2.txt
file1.txt
- the scan etc performs as expected.file 2.txt
- is not picked up and scanned.
From Matthew:
Found out yesterday that s3 event notifications don't handle file names with spaces in and instead converts them to + 's which then I found was confusing my lambdas as they'd be trying to copy a file called e.g. my+file.txt based on the event notification being fed in, but the file in s3 is called my file.txt and so things we're getting stuck. Don't know if something you want/need to handle in your ap ingestion code
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html
Initial attempts to solve this were reverted. We reverted them (and deleted the release we tested with) because files with spaces in them e.g. file 2.txt
was scanned but was marked as infected when it was not.
To Reproduce
As part of briefly looking at this I (Gary) realised that there is currently no documented manner of testing the handler.py
function in analytical-platform-integestion-scan.
If I recall when this was initially created we tested by creating release candidates and testing the full process, part of solving this bug should be to create a self-contained method in the analytical-platform-integestion-scan repository to test the handler.py
function and document this in the README.
Expected Behaviour
Files with spaces are handled correctly.
Additional context
No response