This is a living literature review (dynamically updated) on tokenization in language modeling.
This project is developed and maintained by Avijit Thawani, with help from a powerful open template by Eshaan Aggarwal.
It also wouldn't have been possible without the Semantic Scholar API, Notion API and Github Actions!