bcicen/wikitables

Tables with repeated rows are parsed incorrectly

Closed this issue · 2 comments

Example:

https://en.wikipedia.org/wiki/Greek_letters_used_in_mathematics,_science,_and_engineering

The Greek Letters table has five repeating groups of rows. Wikitable only gets the last group of row, so 4 row-groups are inaccessible.

These are specially formed tables and the schema is hard to determine.

I don't think there is any way currently to automatically extract specially formatted tables without writing specific code for it.

Ok, I've used Pandas for that table.