DistrictDataLabs/baleen

Add load from csv

Opened this issue · 3 comments

console/commands/load can handle OPML files.

I don't have OPML, and couldn't easily find an OPML editor. CSV is easy to compose, however.

Add support for loading feeds from CSV.

Great idea! In terms of OPML editor, we actually used feedly which has an export to OPML feature. However, CSV is a great feature to add!

It looks like the required fields to create a Feed are link and category, with an optional title.

Is that right?

Here's my understanding of a Feed:

from baleen.models:

class Feed(me.DynamicDocument):
   # my (optional) title for this feed
    title = me.StringField(max_length=256)  

    # the link to get the RSS feed. FeedParser may update it during sync if it sees a different href. 
    link = me.URLField(required=True, unique=True)  

    #  A dict of xmlURL, which is the link above, and an htmlURL, which is ...?  the human-friendly version of the site? 
    urls = me.DictField()

   # my name for the collection of documents  - like a corpus name. One category per feed.
    category = me.StringField(required=True)
  
   # for Baleen - guessing the Job ignores inactive feeds
    active = me.BooleanField(default=True)

    # fields that the FeedParser package modifies
    version = me.StringField(choices=FEEDTYPES)
    etag = me.StringField()
    modified = me.StringField()
    fetched = me.DateTimeField(default=None)
    signature = me.StringField(max_length=64, min_length=64, unique=False)

    created = me.DateTimeField(default=datetime.now, required=True)
    updated = me.DateTimeField(default=datetime.now, required=True)

Am I heading in the right direction? This is simpler than I was expecting.

Yep, that's pretty much correct - the OPML file doesn't contain much information - title and link are by far the most important, with category and active being of secondary importance.