collab-uniba/Senti4SD

input file and output file row count doesn't match

nasifimtiazohi opened this issue · 4 comments

testinput.xlsx
testoutput.xlsx

the csv formats of these files were my input. where the input files have 1826 rows, the output file has 1829 rows-- and I have no way to say which is which. I just followd the procedure explained in the documentation. Can you please tell me what is the problem?

Furthermore, I don't think it's a good design that the output file doesn't generate labels along with the associated comments/other infos. The t0,t1 won't help me in anything. I am not sure what they mean.

Can you guys address this problem a bit quickly. I was trying to use this impressive tool in my research and I need to run it on over 1 million of texts. I am short in time. If this problem persists, I cannot proceed.
@fedemaiorano

Hi @nasifimtiazohi,
i launched classificationTask.sh over a csv format of testinput.xlsx (testinput.csv.zip in the zip you can find the csv file i used; i saved the csv quoting all the text cells). And with this csv the output file of classificationTask.sh has 1827 rows (the first row is the header)

The tool works sequentially over an input text, so t0 is the first text of the input file, t1 is the second etc.

hi @fedemaiorano , thnks for the quick response.

Can you tell me how you quote delimitted the csv files? Also I need to batch process a lot of .xlsx files into such csv files (comma and quote delimmited, right?). Do you have any quick suggestion on how to do that?

I quoted the cells directly from my spreadsheet, when saving the .xlsx in csv format.
I never needed to convert a lot of .xlsx files into csv, so i don't kwow how to do that. Maybe you can write your own script.

problem was solved by quote delimmitting the texts in the csv file