4,720,000 sets of Chinese and Uighur language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation. For more details, please refer to the link: https://www.nexdata.ai/datasets/nlu/1185?source=Github
TXT
Chinese-Uighur Parallel Corpus Data
4.72 million pairs of Chinese-Uighur Parallel Corpus Data. The Chinese sentences contain 22 characters on average
Chinese, Uighur
machine translation
90%
Commercial License