- XML files with parallel data
- "source" and "target" tags to be present inside the XML files (containing text inside), or else these tags can be replaced in the code
The code assumes you have XML files with tags "source" and "target" which contain source side-sentences and target side-translations. It attempts to provide one with a dataframe of parallel sentences and can help generate parallel corpus.