/string-matching-hadoop

Extremely long string matching in Hadoop.

Primary LanguageJava

verylong-string-matching-in-hadoop

Extremely long string matching in Hadoop.

Given an extremely long string without '\n' or any delimiter containing only 4 characters, including a,b,c, and d in multiple 1 Gigabyte files.

The goal is to find the "Exact" match of the given pattern in multiple 1 Gigabyte files using Hadoop.

Work are separated in 3 tasks.

  1. Brute-force using single computer.
  2. Brute-force using multiple computer using Hadoop.
  3. Make it better from the previous two task using Hadoop.

Progress

  1. DONE, use code from #2 but only use 1 node.
  2. Successfully implemented. Support 1GB file already. ( Extremely Slow )
  3. Successfully implemented. Support 1GB file already. ( A lot Faster than #2 )

._.