Building a search engine Tokenization - Phase 1 Term Weighting - Phase 2 Index - Phase 3 Retrieval* - Phase 4 Document Clustering* - Phase 5