Implement a interface for external datasets.
Opened this issue · 2 comments
rajarshd commented
Currently, it is hard to run experiments with new datasets, as the interface to create multi_corpus object is missing. This makes starting up with the code very difficult.
Originally reported by @manzilzaheer.
shehzaadzd commented
Use protocol buffers. Suggested and told to comment by Manzil Zaheer (http://www.manzil.ml/)
yeliu918 commented
How can you get the correct_paras.json? Do you deal with the whole Wikipedia dataset? Can you provide the preprocessing of data?