danieljf24/dual_encoding

Issue about dataset format

Opened this issue · 2 comments

Hello,
As we were trying to re-implement the model onto other datasets, we get stuck at the generation feature.bin file. Your team has mentioned that we could use txt2bin.py to convert the feature files from txt into binary format, but I'm not sure what should the feature files looks like when it is in .txt form.
Can you provide a few lines of example for the txt feature files? It would be great if there're some example files for reference.
Thank you for your help!

Please refer to here. The format of each line is an id followed by a feature vector.
ps: We have already released our feature extraction code.

Thank you for the example! It would be very helpful for us.