facebookresearch/TransCoder

How to get parallel dataset from already shared raw tokenized data ?

himanshu034 opened this issue · 1 comments

Hi I have looked into the raw tokenized parallel data which is in .tok format. Downloaded the same from https://dl.fbaipublicfiles.com/transcoder/TransCoder_tokenized_test_set_functions.zip . Seems the same methods are written into all 3 language C++, Python and Java. I need to know the generation process of binarized .pth files like "python_sa-cpp_sa-python_sa","cpp_sa-python_sa-cpp_sa"..
Please help. Any help would be much appreciated.

This repo is now deprecated. Please now refer to our new repository https://github.com/facebookresearch/CodeGen.