bcicen/wikitables

Dealing with Templates within Tags

Closed this issue · 2 comments

To reproduce use: https://en.wikipedia.org/w/index.php?title=Ia_Ora_'O_Tahiti_Nui

<poem>{{lang|ty|ʻUa rahu te atua i tōʻu ʻāiʻa Hono noʻanoʻa o te motu rau Heihei i te pua riʻi au ē E firi nape mōrohi ʻore ʻO tāʻu īa e faʻateniteni nei Tē tūoro nei te reo here O te huia ʻA hiʻi i tō aroha ʻIa ora ʻo Tahiti Nui ē}}</poem>

The node is identified as a tag and when the contents are taken the template is incorrectly identified.

The algorithm should be reworked for _read_part in the Field class (recursion?).

pushed 4bb3ce7 to allow all nodes (including nested templates) to be discovered on read

Cannot work for some tables. To reproduce try the article 'Ab-Soul discography'.

More checking must be done.

t = import_tables('Ab-Soul discography')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "wikitables/__init__.py", line 23, in import_tables
    return list(_table_gen())
  File "wikitables/__init__.py", line 21, in _table_gen
    yield WikiTable(name, table)
  File "wikitables/models.py", line 127, in __init__
    self._read(raw_table)
  File "wikitables/models.py", line 160, in _read
    row = Row(self._head, tr)
  File "wikitables/models.py", line 96, in __init__
    super(Row, self).__init__(self._read(head, self.raw))
  File "wikitables/models.py", line 111, in _read
    return zip(head, [ Field(c) for c in cols ])
  File "wikitables/models.py", line 22, in __init__
    self.value = self._read(self.raw)
  File "wikitables/models.py", line 43, in _read
    joined = ' '.join(list(_read_parts(node)))
  File "wikitables/models.py", line 37, in _read_parts
    for x in _read_parts(subnode):
  File "wikitables/models.py", line 36, in _read_parts
    for subnode in n.contents.nodes:
AttributeError: 'NoneType' object has no attribute 'nodes'