TREMA-UNH/trec-car-tools

flat_headings_list is not flat

Opened this issue · 1 comments

soboroff$ ipython3
Python 3.6.6 (default, Jun 28 2018, 05:43:53) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from trec_car import read_data

In [2]: fp = open('/home/collections/news-track-2018-wikipedia/all-enwiki-201708
   ...: 20/all-enwiki-20170820.cbor', 'rb')

In [3]: a = read_data.iter_annotations(fp)

In [4]: page = a.__next__()

In [5]: page.flat_headings_list()
Out[5]: 
[[<trec_car.read_data.Section at 0x106b684e0>,
  <trec_car.read_data.Section at 0x106b6b278>],
 [<trec_car.read_data.Section at 0x106b68518>,
  <trec_car.read_data.Section at 0x106b6bf98>],
 [<trec_car.read_data.Section at 0x106b68518>,
  <trec_car.read_data.Section at 0x106b73160>],
 [<trec_car.read_data.Section at 0x106b68518>,
  <trec_car.read_data.Section at 0x106b73208>],
 [<trec_car.read_data.Section at 0x106b6bf28>],
 [<trec_car.read_data.Section at 0x106b73fd0>,
  <trec_car.read_data.Section at 0x106d780b8>],
 [<trec_car.read_data.Section at 0x106b73fd0>,
  <trec_car.read_data.Section at 0x106d7b160>],
 [<trec_car.read_data.Section at 0x106b73fd0>,
  <trec_car.read_data.Section at 0x106d80080>],
 [<trec_car.read_data.Section at 0x106d78080>],
 [<trec_car.read_data.Section at 0x106d803c8>],
 [<trec_car.read_data.Section at 0x106d80e10>],
 [<trec_car.read_data.Section at 0x106d80e48>],
 [<trec_car.read_data.Section at 0x106d80e80>],
 [<trec_car.read_data.Section at 0x106d83080>]]

In [6]: import itertools

In [7]: itertools.chain.from_iterable(page.flat_headings_list())
Out[7]: <itertools.chain at 0x106d9e0b8>

In [8]: list(itertools.chain.from_iterable(page.flat_headings_list()))
Out[8]: 
[<trec_car.read_data.Section at 0x106b684e0>,
 <trec_car.read_data.Section at 0x106b6b278>,
 <trec_car.read_data.Section at 0x106b68518>,
 <trec_car.read_data.Section at 0x106b6bf98>,
 <trec_car.read_data.Section at 0x106b68518>,
 <trec_car.read_data.Section at 0x106b73160>,
 <trec_car.read_data.Section at 0x106b68518>,
 <trec_car.read_data.Section at 0x106b73208>,
 <trec_car.read_data.Section at 0x106b6bf28>,
 <trec_car.read_data.Section at 0x106b73fd0>,
 <trec_car.read_data.Section at 0x106d780b8>,
 <trec_car.read_data.Section at 0x106b73fd0>,
 <trec_car.read_data.Section at 0x106d7b160>,
 <trec_car.read_data.Section at 0x106b73fd0>,
 <trec_car.read_data.Section at 0x106d80080>,
 <trec_car.read_data.Section at 0x106d78080>,
 <trec_car.read_data.Section at 0x106d803c8>,
 <trec_car.read_data.Section at 0x106d80e10>,
 <trec_car.read_data.Section at 0x106d80e48>,
 <trec_car.read_data.Section at 0x106d80e80>,
 <trec_car.read_data.Section at 0x106d83080>]