huge-file-processor

I need to read a huge file (let's say 10GB), capture which strings (each line has one string, and it could be any word, sequence of character, sentence, whatever) are duplicating and show the 5 most occurred strings and how many times it is been repeated.

The first approach I've tried does not work for huge files, it breaks with files from 300MB. That happens because I'm storing everything on memory.

The second approach, using SQLite, works, but it still too slow.

I'm still trying to figure it out a way to solve that problem with a fast approach.

Installation

$ npm install on server folder

Running

$ node server on server folder

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -a 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request

License

MIT

galelis/huge-file-processor

huge-file-processor

Installation

Running

Contributing

License