Tokenization Principle

Why are some tokenizations better than others?

Read the paper here, watch a 3-minute video of the paper or take a look at the poster.

Cite as:

@inproceedings{tokenization_noiseless, 
    title={Tokenization and the Noiseless Channel},
    author={Zouhar, Vilém and Meister, Clara and Gastaldi, Juan Luis and Sachan, Mrinmaya and Cotterell, Ryan},
    booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics},
    year={2023},
    url={https://aclanthology.org/2023.acl-long.284/},
}

Paper video presentation