Welcome to the Turkmen Sentence Dataset project! This dataset is designed to assist in the development and analysis of the Turkmen language. The dataset is divided into three levels to cater to various needs and applications.
- This level contains sentences sourced from contemporary news outlets, online platforms, and articles. It's perfect for analyzing current language usage and trends.
- Sources: salamnews.tm, turkmenportal.com
- Sentences in this level are extracted from classic and modern Turkmen literature. This level aims to preserve and study the rich literary traditions of the Turkmen language.
- Sources: tmLang-NLP
- This level will include sentences from newly created or emerging Turkmen language sources. Stay tuned for updates!
- Sources: Will be updated as new sources are incorporated.
This project is licensed under the Collaborative Development License. This license allows for collaborative development and contributions but prohibits the use of this dataset in production environments.
We believe that the development of this project will benefit immensely from the collective efforts of the community. Whether you're a linguist, developer, or enthusiast, your contributions are invaluable. Here's how you can help:
- Add New Sentences: Contribute by adding new sentences to any of the levels.
- Improve Data Quality: Help in cleaning and verifying the existing data.
- Expand Level C: As this is a new and evolving area, contributions here are especially welcome.
Every bit of help counts, and together, we can make this dataset a comprehensive resource for everyone interested in the Turkmen language.
- Fork this repository.
- Create a new branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -am 'Add new sentences'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
For any questions or suggestions, feel free to open an issue or contact us directly at 31mb41@gmail.com. You can also reach us on Telegram at t.me/gragamelix.
Let's work together to build a robust and diverse dataset for the Turkmen language! 🌟