This repo contains multiple implementation of suffix array construction algorithms, including internal and external memory methods.
- Add naive construction algorithm
- Load text files
- Add binary search exact match
- Implement Skew 2003
- Implement SA-IS 2009
- Implement eSAIS
- Suffix Arrays: A New Method for On-Line String Searches (1990)
- Simple Linear Work Suffix Array Construction (2003) - DC3/skew
- Fast Lightweight Suffix Array Construction and Checking (2003)
- Replacing Suffix Trees with Enhanced Suffix Arrays (2004)
- Linear Work Suffix Array Construction (2006) - DC3/skew
- A Taxonomy of Suffix Array Construction Algorithms (2007)
- Better External Memory Suffix Array Construction (2008)
- Scalable Parallel Suffix Array Construction (2008)
- Linear Suffix Array Construction by Almost Pure Induced-Sorting (2009) - SA-IS
- Optimal In-Place Suffix Sorting (2016)
- Deduplicating Training Data Makes Language Models Better (2022)
- Based on SA-IS, modified to use external memory but original text must fit in memory.
- QSufSort (based on 1999 Larsson-Sadakane algorithm)
- Yuta Mori's DivSufSort "fastest known suffix algo in main memory" as of 2017
- Ilya Grebnov's even faster implementation
- Look at google research's SA-IS extern memory implementation
- Simple Linear Work Suffix Array Construction (2003)
- Linear Work Suffix Array Construction (2006)
- A Taxonomy of Suffix Array Construction Algorithms (2007)
- Mark Ormesher Notes
- Sebastian Wild Video