LUMIA-Group/rasat

Why is table alignment necessary?

Closed this issue · 2 comments

align_tables.py seems to swap column names for certain databases.

For example, for the Spider dataset, the column names for the store_1 database is swapped. I understand that this is because of annotation issues from the original Spider dataset. However, I don't understand why a simple swapping solves this issue.

We swap those column names to make the order of original column names and clean column names to be same. We use the clean version to generate the relation of the schema and input, while using the original column in our input.

Thus we make sure they are in the same order for us to generate the relations conveniently.

Oh I see. For the Spider dataset, there is exactly one table (store_1) where the elements in table_names and table_names_original do not have a 1:1 correspondence for some reason. Thus, we have to do a manual swap of the table names and their corresponding column names.

likewise, there are similar issues with other tables for CoSQL and SParC that have to be manually adjusted.