/diverse_english_corpus

A diverse tweet corpus of English varieties

Primary LanguagePython

diverse_english_corpus

This repository contains a diverse tweet corpus of English varieties. To maintain anonymity during the review process, some files have been removed from the repository.

The corpus is intended to be a resource for researchers and developers working on natural language processing tasks in English, particularly those interested in addressing the challenges posed by non-standard and non-Western English varieties. By providing a diverse set of tweet samples, we hope to encourage the development of language models that are more robust and inclusive.

Please note that this repository is not intended to be a comprehensive or representative sample of all English varieties, and the data should be used with appropriate caution and sensitivity to cultural and linguistic differences.

We welcome contributions and feedback from the community to help improve and expand this resource.