zhaozhang/SparkMontage

Odd madd-submit errors because compute-madd-classpath.sh error check is wrong.

nealmcb opened this issue · 4 comments

When I run bin/madd-submit with Spark 1.3.1 I get:

Exception in thread "main" java.net.URISyntaxException: Illegal character in path at index 6: Failed to find appassembler scripts in /srv/s/spark/SparkMontage/target/appassembler/bin
You need to build Madd before running this program
    at java.net.URI$Parser.fail(URI.java:2848)

I did use the mvn commands in the README.md to build and got BUILD SUCCESS. The indicated bin directory has an madd and a madd.bin file in it, but no "MADD" as the compute-madd-classpath.sh script seems to expect.

I did get some warnings, which I don't understand, but which seem unrelated:
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.hadoop:hadoop-client:jar with value '*' does not
match a valid id pattern. @ line 295, column 25

The confusing Illegal character error comes because there isn't enough error handling in the shell scripts, and the spark-submit command is run with a an error message as the --jars argument:

spark-submit --class edu.berkeley.cs.amplab.madd.SparkMadd --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=edu.berkeley.cs.amplab.madd.serialization.MaddKryoRegistrator --conf spark.kryoserializer.buffer.mb=4 --conf spark.kryo.referenceTracking=true \
  --jars 'Failed to find appassembler scripts in /srv/s/spark/SparkMontage/target/appassembler/bin
  You need to build Madd before running this program' /srv/s/spark/SparkMontage/target/appassembler/repo/edu/berkeley/cs/amplab/madd/madd/0.0.1-SNAPSHOT/madd-0.0.1-SNAPSHOT.jar

Aha - now I see that it is a simple typo/change of name. I'll submit a pull request for SparkMontage/bin/compute-madd-classpath.sh to change if [ ! -f "$BASEDIR"/bin/MADD to if [ ! -f "$BASEDIR"/bin/madd ....


Now I just get a java.lang.OutOfMemoryError: GC overhead limit exceeded message. How much memory does it need?

Hi Neal,

I just gave it a try, it seems this code does not work with spark-1.3.1,
the one I was using was spark-1.1.0, which can be seen in pom.xml. This
piece code is out of date, and I do not have a plan to maintain it.

This is an experimental code for a proof of concept for a paper. If you are
interested in spark+Astronomy, please take a look at
https://github.com/BIDS/Kira/tree/pyKira/scratch/pyspark/src/main/python.

Zhao

On Thu, Feb 25, 2016 at 3:53 PM, Neal McBurnett notifications@github.com
wrote:

When I run bin/madd-submit with Spark 1.3.1 I get:

Exception in thread "main" java.net.URISyntaxException: Illegal character in path at index 6: Failed to find appassembler scripts in /srv/s/spark/SparkMontage/target/appassembler/bin
You need to build Madd before running this program
at java.net.URI$Parser.fail(URI.java:2848)

I did use the mvn commands in the README.md to build and got BUILD SUCCESS.
The indicated bin directory has an madd and a madd.bin file in it, but no
"MADD" as the compute-madd-classpath.sh script seems to expect.

I did get some warnings, which I don't understand, but which seem
unrelated:
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for

org.apache.hadoop:hadoop-client:jar with value '*' does not
match a valid id pattern. @ line 295, column 25

The confusing Illegal character error comes because there isn't enough
error handling in the shell scripts, and the spark-submit command is run
with a an error message as the --jars argument:

spark-submit --class edu.berkeley.cs.amplab.madd.SparkMadd --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=edu.berkeley.cs.amplab.madd.serialization.MaddKryoRegistrator --conf spark.kryoserializer.buffer.mb=4 --conf spark.kryo.referenceTracking=true
--jars 'Failed to find appassembler scripts in /srv/s/spark/SparkMontage/target/appassembler/bin
You need to build Madd before running this program' /srv/s/spark/SparkMontage/target/appassembler/repo/edu/berkeley/cs/amplab/madd/madd/0.0.1-SNAPSHOT/madd-0.0.1-SNAPSHOT.jar


Aha - now I see that it is a simple typo/change of name. I'll submit a
pull request for SparkMontage/bin/compute-madd-classpath.sh to change if

[ ! -f "$BASEDIR"/bin/MADD to if [ ! -f "$BASEDIR"/bin/madd ....

Now I just get a java.lang.OutOfMemoryError: GC overhead limit exceeded
message. How much memory does it need?


Reply to this email directly or view it on GitHub
#4.

Thanks - I understand, and thanks for the link. If you remember what size system you used it on, I'll know how much memory to try it with.

For reference, the paper, which seems great, is Rethinking Data-Intensive Science Using Scalable Analytics Systems

I usually run the old program with my default JAVA_OPTS, try 2g for the
test workload.

Zhao

On Thu, Feb 25, 2016 at 4:32 PM, Neal McBurnett notifications@github.com
wrote:

Thanks - I understand, and thanks for the link. If you remember what size
system you used it on, I'll know how much memory to try it with.

For reference, the paper, which seems great, is Rethinking Data-Intensive
Science Using Scalable Analytics Systems
https://amplab.cs.berkeley.edu/publication/rethinking-data-intensive-science-using-scalable-analytics-systems/


Reply to this email directly or view it on GitHub
#4 (comment)
.

Thanks!
I ran SPARK_MEM=2G bin/madd-submit with spark 1.3.1, and that seems to have worked. It took nearly half an hour on my laptop, 41 jobs or more.