/Colonoscopy

Assist in the parsing of the Collection# dumps

Primary LanguageRubyApache License 2.0Apache-2.0

     lXMMXk:.                                                   ,Kx,
       .ckNMMNx:.                                               :0MMK;
           'lOWMMKd,                                              '0MMx
               ,oKMMM0l'                                    .'.     xMMd
                  .;dKMMWOl'                      ..,.......:MMo      0M.\
                       .;xXMMWOc.                  <:C;0:|:0;n:O$:C:0;P:Y.\
                         ,xWMM0c.               ,:'.'.'..   :MM0.,:cccoxxoo:.
                      'dNMMKl.                            lWMX.       odooddd:
                  'dNMMKl.                            .cKMMk        kMMc .odok:
               ,dXMM0l.                     .:c;;;:lxKMMNd.       ;XMWl     ldlk.
            ,xNMM0c.                     .c0MMMMMMMWKkl'      .;dNMMk.       .klk;
         'dNMMKl.                     .c0MMWx, ...   .lKMMNxxxxdo:'            klO,
       ,KMMKl.                      :OMMWk;       .lKMMNxxxxdo:'                O:0.
       dMM0'                     .cOWMNx;      .'dKMMXd'                         .0ck
     ;MMO                    .c0MMNx,       'dNMMXo.                              llO'
      dMM,                 .cOMMWk,       'oXMMXo.                                .Kcd
      lMMl                 dWMMK.      .oXMMXo.                                    k:K
      .XMW;                 .xWMWo   lXMMXo'                                       :ok'
       .kMM0;                 .dWMWd  dWMNl                                        .0cl
         .kMMK;                  oWMWx. oWMWo                                      0:0
          .kMMK;                  oWMWx. oWMWo                                     oc0.
            .kMMK;                  oWMWx. oWMWo                                   .0cd
              .xWMX:                  lNMWx. lNMWd.                                 dlO'
                .xWMNc                  cNMMk. cNMMk.                                Ock;
                  .xWMNc                  cNMMk. :KMMk.                               odddc,..
                    .xWMNc                  cNMMk' ;KMMk.                              .lldxxx.
                      .xWMNc                  cNMM0, ;KMMk.                                ..'
                        .dWMNl                  :KMM0; ;0MMO'
                           oWMWo                  ,0MMK, ,0MM0,
                             oWMWo                  ,0MMx ,0MM0,
                               oWMWo                  xMMo   ,0MM0.
                                 lNMWd.                WMK     ,XMW'
                                   cNMWx.             .WMK      'MM0            [CodeName]:     Colonoscopy
                                     cNMWx.          .0MM:       XMX            [Produced]:     illMob - Q1/2019
                                       cNMWx'      .oNMW:       ;MMk            [Moto/PSA]:     Get yo shit checked!
                                          cXMMWKOO0NMMMk.       lWMX.
                                            .;ldxxxokWMWOo:;:lkNMWd.
                                                     ;d0NMMMWXk:
--------------------------------------------------------------------------------
What is Colonoscopy?
  An application that was produced to aid in the extraction of useful information from the publically released
  Collection dumps. These dumps were compiled in a way that makes any single query useless and missing vital
  information based on patterns or other techniques. Colonoscopy goes deep into the dumps and attempts to extract 
  potential and known Username:Password credential pairs.
  
  The application will not unzip any of the contents and will require the user to perform some actions (or script) that
  will generate a monolithic file from these .txt files. These files will contian different versions of these credential
  pairs which can be a daungting task to extract from common methods.

  The application will do its best to ensure these files are as true and uncontaminated as possible. Users may have to 
  ensure the filters have properly worked by examining the output for errors. These files will be contaioned in the 'evidence' 
  directory and a project name will be assertained from the filename.

How to use Colonoscopy?
  This is a OS independent Commandline application. This application should be used on the OS where the files are located. The
  Files can be attached VIA NAS or other technology, as long as Read/Writer permissions are granted to the executing user.

What is the theory being Colonoscopy?
  The application is designed to process large wordlist through sorting and further processing to identify useful patters 
  in the Collection# dumps. The theory is the user can input two different forms of input. a monolythic file which contains
  millions of lines of input. Or to process Gunzip files in a directory of your choosing. Either process used can be scripted.
   
  It should be metioned that the application in single filemode may perform better as there is no overhead or memory requirements.
  In single file mode, the application reads line by line reducing the overall time of computation. Additionally, the application
  will write the data to disk in blocks to reduce IO performance impacts.
   
  In Gunzip bulk mode, the application can be allowed to run in the background performing its actions over time. This can be useful
  for headless systems. This is intended to be used on the Collection dumps raw files. The benifit other than automation, 
  is accuracy to the data, and filenames. This allows the user to identify where the passwords were identified and contained within 
  which 'collection' examined.
   
  The application may take a performance hit when dealing with unique and sorting of larger datasets. Ruby is not ideal for such
  processes, however there are some great cuda applications which allows for the GPU to offload these processes for increaded 
  performance. Currently the application tries to reduce the amount of disk space consumed, this  may change in future releases.

Example of application options

  Usage: Colonoscopy [options]
      -t, --type TYPE                  Consume all files of type ["TXT","GZ"] in a directory
      -i, --input FILE/FOLDER          location of file(s) to process
      -e, --explode                    Explode contents of "tar.gz" files to disk
      -o, --output{=FOLDERNAME}        Location on disk to store contents generated
      -d, --debug                      Enables debugging information to be displayed
      -v, --verbose                    Adds additional information to STDOUT, performance hit increases

Example of single file execution?
  ex:  Colonoscopy.rb -i ./Somecollection.(txt|gz)

Example of bulk gz file execution?
  ex:  Colonoscopy.rb -i ./ -t gz

Example of bulk gz file execution to given folder?
  ex:  Colonoscopy.rb -i ./ -t gz -o ./mydump/Somecollection

Example of bulk txt file execution with debugging?
  ex:  Colonoscopy.rb -i ./ -t txt -d

Example of expected output?

      ./evidence/
            |
            | --./Somecollection/
            |       |
            |       | --./cleaned.txt
            |       | --./colon.txt
            |       | --./semi.txt
            |       | --./unknown.txt
            |       | --./blaw.etc
            |
            | --./SomeOthercollection/
            |       | --.blaw.etc

:[ATTENTION]:
    [!] The application uses the input name to set a project folder in the 'evidence' directory
          [-] If you use the same name for the input folder as an existing project output, it will be overwritten
    [!] Diagnostic output will be given if using the argument -d and to add verbosity, use -v
          [-] This will decrease the applicaitons performance dramtically, be forewarned.
:[ATTENTION]:
 
:[PSA]: Everyone should get their Colons checked by a professional, to many people seem butt hurt..., avoid amateurs!