salsama/Arabic-Information-Extraction-Corpus

Arabic linguistically analyzed corpus including dependency relation corpus, the input is the text that collected from the web and includes five fields which are a sport, religious, weather, news and biomedical. The output is file in CoNLL universal lattices (CoNLL-UL) format. The review revealed that much of the research presents a corpus for different linguistic features and elements without including the dependency relation. The corpus built with an index of all sentences and their linguistic meta-data to enable quick mining and research across the corpus. The dependency relation in this corpus has seventeenth characteristics and 8 categories of the word.

No issues in this repository yet.