File-Sorter
Golang/Python/Bash script to sort big data files such as Data-Breaches to be able to grep/find a line much quicker.
What it does?
It is a Golang/Python/Bash script to sort big data files such as Data-Breaches to be able to grep/find a line much quicker.
Okay but... How?
It takes the input file (Has to be located under inputbreach/breach.txt) and splits it equally in files which are then orginised neatly where they can be found quickly. This way the system doesn't have to grep a huge file to find an occurence, it just needs to scan the right file which will be much smaller, resulting in faster speeds.
How to install it?
Easy! In a one-liner:
git clone https://github.com/bastien8060/file-sorter
How to import data?
Golang
cd file-sorter cd ./golang ./addbreachNote: The data file has to be in
golang/inputbreach/breach.txt
Options:
- -D: delete source file after completed.
Python
cd file-sorter cd ./python ./addbreach.pyNote: The data file has to be in
python/inputbreach/breach.txt
Options:
- -D: delete source file after completed.
Bash
cd file-sorter cd ./bash ./sorter.shNote: The data file has to be in
bash/inputbreach/breach.txt
Options:
- -D: delete source file after completed.
How to query data?
After you have finished importing the data file, you can query for it by typing: E.g.
./query.sh name@example.com
Information:
In term of speed, the Golang version is much faster to run, taking a couple second only to import 50mb.
In second place comes the Python3 version taking about nearly a minute.
I would avoid relying on the bash version as it is old and hasn't been maintained in over a year and an half. However, for information, it takes the bash version few minutes to import a 50mb file.
List of imported files are in the "imported.log" file. The script keeps track of imported file in this log with their SHA sums to prevent a file to be added twice. Each platforms are works differently and do not interfere with each others, therefore, it does not check duplication across platforms (e.g. from Python to Golang)
All data is in "data" folder.
Use only for educational and penetration testing purposes.