OpenRefine/CommonsExtension

New structure for category fetching

Closed this issue · 0 comments

To simplify our code, I would propose the following architecture for the category fetching, based on Java iterators. We would need the following classes:

  • A class (say FileRecord) which would essentially represent the contents of a record in the project (although it would not yet be formatted as a list of rows). It would contain the attributes:

    • a file name
    • its corresponding mid
    • the list of categories it belongs to
  • A class where the constructor takes a single category name as parameter, and implements the Iterator<FileRecord> interface: it iterates over the file names contained in that category. In each FileRecord the categories would be left empty as a first step. So really the only task of this class would be to make the HTTP requests to the Commons API with the appropriate paging.

  • A class which takes an Iterator<FileRecord> (an iterator over file names) as parameter, and implements Iterator<FileRecord> again: its task would be to fetch the categories each file belongs to, and store them in each FileRecord.

  • A class which takes an Iterator<FileRecord> and implements TableDataReader. Its task would be to convert each FileRecord to one or more rows (by spreading the categories down on blank rows as we are currently trying to do)

With all those building blocks, you could then combine them (chain them) all together into the importer.