databricks/spark-corenlp

publishing this to spark-packages repo

Closed this issue · 5 comments

@mengxr would you mind publishing to a repo now that CoreNLP 3.6.0 is available on Maven Central?

http://search.maven.org/#artifactdetails%7Cedu.stanford.nlp%7Cstanford-corenlp%7C3.6.0%7Cjar

thanks!

Done.

thanks, @mengxr!

quick question that's slightly related. it appears that --packages does not support specifying a classifier as follows:

groupId:artifactId:version:classifier

This classifier is needed to pull in the stanford-corenlp-3.6.0-models.jar, of course.

I found this in the Spark code base that references just groupId:artifactId:version for --packages: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L495

Curious if you worked around this in a more clever manner than downloading the jar and referencing it with --jars. Is this a potential Spark Jira? I've searched quite a bit, but can't seem to find anything related.

@mengxr

i still don't see the library:

http://dl.bintray.com/spark-packages/maven/databricks/

spark-avro/
spark-csv/
spark-redshift/

and spark-packages.org says it hasn't been released. (not sure how that actually gets updated)

@mengxr is there a timeline to publish the repo? Since my team uses spark + corenlp, this will be quite an interesting package to work with.