lilt/alignment-scripts

"help" More details need in Usage Instructions

Closed this issue · 2 comments

Hi
Thank you for your work, I follow the usage instructions but I can't using it correctly. because I'm luck the corresponding data set. I have try to understand your code but I found that those dataset has some unfamiliar suffix. Those suffix hinder me to find correct data with data preprocessing.

Thank you for your kindness assistance.

Hi @PlayDeep ,
Thanks for taking a look at this repository. Can you provide a bit more detail what you exactly did and where you are stuck/what you are trying to do next?
In general you should be able to take a look at the resulting files using less or vim.
All files are text files in utf-8 encoding, here's a short summary of the meaning of the file endings:

  • src: text in the source language
  • tgt: text in the target language
  • talp: alignment points
  • bpe: text splitted into sentence piece/bpe units
  • lc: lowercase text

Let me know if this already helps or if you have more questions.

@thomasZen Thank you for your notes and your scripts, I have cloesed this issue after fixing the problem. The problem is caused by runing scripts with sh.

In the end, thank you for your kindness reply.