Nested json from solr to spark
nelibla opened this issue · 1 comments
We are testing nested json with solr and trying to analyze it in spark with python. We are using data from repository https://github.com/alisatl/solr-revolution-2016-nested-demo/blob/master/data/example-data-solr.json
The json scheme is following:
Code below:
sqlContext.read.format("solr").option("zkhost", config.zkserver).option("collection", config.solr_collection).option('child_doc_fieldname', '_childDocuments_').option("query", 'path:2.posts.comments AND sentiment:negative').option('fields', '*,[child parentFilter=path:"2.*"]').load()
Produces spark dataframe with only one column - field id.
The problem is with fields parameter "child parentFilter", since below examples work properly:
.option('fields', '*')
.option('fields', 'text, author')
We don't support that particular syntax for nested fields right now