/file-sorter

Sorts big files so you can grep them very quickly

Primary LanguageShell

File-Sorter

Golang/Python/Bash script to sort big data files such as Data-Breaches to be able to grep/find a line much quicker.

    What it does?

    It is a Golang/Python/Bash script to sort big data files such as Data-Breaches to be able to grep/find a line much quicker.

    Okay but... How?

    It takes the input file (Has to be located under inputbreach/breach.txt) and splits it equally in files which are then orginised neatly where they can be found quickly. This way the system doesn't have to grep a huge file to find an occurence, it just needs to scan the right file which will be much smaller, resulting in faster speeds.

    How to install it?

    Easy! In a one-liner:

    git clone https://github.com/bastien8060/file-sorter

    How to import data?

  • Golang

  • cd file-sorter
    cd ./golang
    ./addbreach
    Note: The data file has to be in
    golang/inputbreach/breach.txt

    Options:

    • -D: delete source file after completed.
  • Python

  • cd file-sorter
    cd ./python
    ./addbreach.py
    Note: The data file has to be in
    python/inputbreach/breach.txt

    Options:

    • -D: delete source file after completed.
  • Bash

  • cd file-sorter
    cd ./bash
    ./sorter.sh
    Note: The data file has to be in
    bash/inputbreach/breach.txt

    Options:

    • -D: delete source file after completed.

    How to query data?

    After you have finished importing the data file, you can query for it by typing: E.g.

     ./query.sh name@example.com

  • Information:

    • In term of speed, the Golang version is much faster to run, taking a couple second only to import 50mb.

      In second place comes the Python3 version taking about nearly a minute.

      I would avoid relying on the bash version as it is old and hasn't been maintained in over a year and an half. However, for information, it takes the bash version few minutes to import a 50mb file.

    • List of imported files are in the "imported.log" file. The script keeps track of imported file in this log with their SHA sums to prevent a file to be added twice. Each platforms are works differently and do not interfere with each others, therefore, it does not check duplication across platforms (e.g. from Python to Golang)

    • All data is in "data" folder.

    • Use only for educational and penetration testing purposes.