/XML2Parallel

Extracting parallel sentences from the XML files at CFILT

Primary LanguageJupyter Notebook

XML2Parallel

Requirements

  • XML files with parallel data
  • "source" and "target" tags to be present inside the XML files (containing text inside), or else these tags can be replaced in the code

Description

The code assumes you have XML files with tags "source" and "target" which contain source side-sentences and target side-translations. It attempts to provide one with a dataframe of parallel sentences and can help generate parallel corpus.