Feature Request for Convenience and a Bug

Question

Feature Request for Convenience and a Bug

Opened this issue 3 years ago · 0 comments

Just got done installing your lovely solution for time-lining and I love this workflow! However, I see some points of improvement:

Installer
- Make uploading the complete .plaso file back to the S3 Bucket optional through a switch in deploy.sh that just does not activate the script watch-plaso-to-s3.sh
- Docker continuously throws an error that it should not run as root. Can be checked using the command sudo docker logs --tail 50 --follow --timestamps timesketch_timesketch-worker_1 | less
- Create an option for deleting all raw data after plaso processing for cases when you are strapped for storage.
- Create an option for the timesketch instance not being run in AWS. I have it running on a on-prem hypervisor with Velo being run in AWS. Further down the road one might also consider shipping the data to SFTP instead of AWS to allow for a full on-prem solution
watch-s3-to-timesketch.py
- If the same hunt is executed twice for some reason the filename in the S3 Bucket will remain the same. It might be interesting to add a unique ID to each item in the S3 Bucket to identify them. I have no good solution for this as of yet, maybe AWS has something already built in. These IDs for every item would then be added to a list/database on the timesketch instance to be checked prior to downloading
- Currently there is a While True loop that sends requests at a very high frequency. This can quickly increase your AWS bill. After running my pipeline for roughly 30 hours I had a 30€ bill despite almost no data being transferred. Having the script poll every 10 seconds or so would drastically decrease the number of requests without slowing the pipeline significantly.
- The AWS Credentials need to be put in in the sourcecode. I think following the AWS best practices with a dedicated file at ~/.aws/credentials might be better. See this for reference
watch-to-timesketch.sh
- The name of the service being installed (data-to-timesketch) is different from the name of the script. This is not the case for the python downloader or the other bash script. It confused me for a moment and I would align that to both be watch-data-to-timesketch
- There is a bug that causes all data from the unzipped Kape .zip to be deleted instead of only the unimportant bits. This is due to the filepath, at least in my installation is [...]$SYSTEM/fs/fs/clients/[...] instead of [...]$SYSTEM/fs/clients/[...] Check the following code for reference

# Remove from subdir
mv $PARENT_DATA_DIR/$SYSTEM/fs/clients/*/collections/*/uploads/* $PARENT_DATA_DIR/$SYSTEM/
# Delete unnecessary collection data
rm -r $PARENT_DATA_DIR/$SYSTEM/fs $PARENT_DATA_DIR/$SYSTEM/UploadFlow.json $PARENT_DATA_DIR/$SYSTEM/UploadFlow

Ideas

I think a central config file might solve some of the issues I faced but I am not sure wether this is the best way to go about it. I will try to create a pull request that offers a solution for the topics I mentioned. Furthermore, creating an SFTP based solution in parallel to the AWS based solution would allow one to host your setup locally. I will see if I get around to that either