/vishal

Apache License 2.0Apache-2.0

vishal

Dataset will be hosted on 🤗 Datasets here

Dataset Processing Type Language Owner Citation
AIBharat IndicCorp In Process Original Scraped en-in, hi, as, bn, gu, kn, ml, mr, or, pa, ta, te HC citation
CC-100 Corpus In Process Original, Romantized as, bn, bn_rom, gu, hi, hi_rom, kn, ml, mr, ne, or, pa, sa, si, sd, ta, ta_rom, te, te_rom, ur, ur_rom HC citation
WMT NEWS Crawl Available to pickup Original Scraped bn, gu, hi, kn, ml, mr, or, pa, ta, te citation
Charles University Hindi Monolingual Corpus Available to pickup Parallel Corpora hi, en
IIT Bombay Hindi Monolingual Corpus Available to pickup Parallel Corpora, Monolingual hi, en citation