lupantech/PromptPG

Many tables do not have the correct column headers

UtkarshGarg-UG opened this issue · 1 comments

Hi, thank you for the great dataset. There are some tables where the column headers are not right. The columns say "stem" and "leaf" which is not right. Some of the examples are:

30971
30972
30978

Thank you for your kind words and valuable feedback on the dataset!

In our approach, we employed regular regressions alongside heuristic rules to convert the raw text of tables into structured formats. Despite thorough reviews and continuous improvements to our scripts, some tables remain imperfectly parsed.

We have observed that these less accurately parsed tables, especially those with incorrect headers, do not significantly affect the models' comprehension. Moreover, as they represent a minor fraction of the total dataset, we have chosen to retain them in the current annotations. This approach aims to maintain a consistent evaluation framework for both previously tested models and those anticipated in the near future.