/Europarl-catalan

Aligned Catalan-German and Catalan-English Europarl corpus. Catalan sentences translated from Spanish using Apertium RBMT.

Europarl-Catalan

Aligned Catalan-German and Catalan-English Europarl corpus v7. Catalan sentences were translated from Spanish using Apertium RBMT.

The Spanish original Europarl v7 corpus has been improved to fix spelling mistakes and errors which benefits the Catalan translation. The file europarl.es-en.es.xz contains the improved Spanish corpus which is the one that we used to produce the Catalan corpus.

Catalan-German alignment has been obtained using this alignment finder from de-en and ca-en.

  • Catalan-English: 1 965 735 segments.
  • Catalan-German: 1 734 644 segments.

Note: files with extension xz need to be descompressed with xz.

License

CC BY 4.0