amplab/succinct

Add support for JSON documents

anuragkh opened this issue · 1 comments

It would be nice to have support for JSON documents, perhaps through a SuccinctJsonRDD. The JSON objects would have an associated primary key, which can be a field within the document itself called "id"; it should support the following semantics:

// Return the JSON document associated with the given ID.
def get(id: Long): String

// Return an RDD of IDs for documents that match a particular JSON field value
def filter(field: String, value: String): RDD[Long]

// Return the RDD of IDs for documents that contain a particular query term
def search(query: String): RDD[Long]

Added in v0.1.6