chop-dbhi/harvest

Refine the description of what Harvest is

Opened this issue · 0 comments

This was provoked by doing some recent reading on distributed query engines that support multiple data sources which is a natural direction for Harvest to go (whether we use existing technologies or not). I asked the question, "what is compelling about Harvest" even compared to these technologies which appear to be much more "impressive" on a technical level.

A statement that stood out when watching a presentation on Apache Drill is that there are very few contenders in the ad-hoc query space, whether due to it's overall difficulty or lack of interest until now. In some respects a distributed ad-hoc query engine is being regarded as an alternate or at least supplementary to doing ETL across multiple systems (of course this still assumes your data and formats aren't garbage).

That being said, I think putting a bit more emphasis on the ad-hoc query bit and increased delineation between the components of Harvest would be more concrete than the vague description on the Harvest homepage. Something along the lines:

Harvest is a framework for building ad-hoc query applications for relational databases. Harvest is composed of:

  • An ad-hoc query engine built on top of Django's ORM called Avocado
  • A REST HTTP service for interacting with Avocado
  • An extensible single-page application (SPA) Web client for building and interacting with queries

That being said, we want to balance the tech terms here (SPA is likely to not be needed) to not lose the less technical audience. A side effect to this is more specific description is the cleaner separation of the libraries themselves. For example defining Avocado as the query engine and metadata index means those pieces can evolve or be replaced in the future as the project evolves.