Weird error when used with prismatic plumbing
aconbere opened this issue · 13 comments
When I try to use prismatic plumbing, I run into a weird error java.lang.ClassNotFoundException: pig_graph_test.graph-record3363
. graph-recordxxxx
is a macro generated record exposed by the library, but my attempts to get pigpen to play nice with this have all failed.
I suspect something about the way that code/trap-*
operates? I've included a little test repository that you can just hit with lein run
to reproduce, and includes the full stack trace.
Thanks for the detailed repro! From the error it looks like they have local vars with periods in them, which doesn't play well with the edn/read-string. I should be able to easily exclude those from the closure, but I'll make sure that they aren't required by the closure first.
@mbossenbroek I'd love to know more about how you derived that (after staring at this code for an hour or two I'm thoroughly confused and suspect I might learn something).
Certainly! pigpen.code/trap
re-writes your function like this:
=> (let [x 42]
(pigpen.code/trap
(fn [y] (+ x y))))
(pigpen.pig/with-ns pigpen-demo.core (clojure.core/let [x (quote 42)] (fn [y] (+ x y))))
What this returns is an expression, which when evaluated, will evaluate your user function within your namespace, with all of the lexical scope that was present at script generation time. Anything that's bound at that time ends up in that let
that encloses your function. It's a way of freezing everything we know now and reviving it later on a hadoop machine.
I've seen in the past that macro expansion will often leave a bunch of junk in there that's not actually required by the user code. for
is a good example of that.
The java.lang.ClassNotFoundException
is a classic example of Clojure interpreting any symbol with a period as a java class and trying to load it:
=> (eval '(prn x.y))
CompilerException java.lang.ClassNotFoundException: x.y, compiling:(/private/var/folders/54/cllx6y1d0nz92rmz915fgc4mmjkfgm/T/form-init3678987450896995460.clj:1:8)
In pigpen, we take the result of pigpen.code/trap
above, pr-str
it, put it in the script, read it, eval it, and run it. If you're getting that error, that symbol is likely getting into the closure somehow and failing when we try to eval it.
At least that's my guess at this point :)
oof, well the only thing I think I can offer at this point is that this dotted name that it can't find is probably needed. It's generated here https://github.com/Prismatic/plumbing/blob/master/src/plumbing/graph/positional.clj#L12-L30 and is building a record that is used in place of a map further on in the library for performance.
My worry initially is that I've seen very strange behavior in clojure with regards to records and file load ordering (which was in that case solved by AOT compiling certain namespaces). And injecting a record into the namespace at run time seems like a very easy thing to have break with the pigpen approach.
@mbossenbroek one other question, can you think of any work arounds for this in the short term? I have some uses for pigpen that will be blocked on a fix. A work around would free me up to continue. Also, let me know if there's anything else I can do to help here.
None off the top of my head. Sorry I didn't get a chance to look at this yesterday - I'll have something for you today though.
HA! I have no expectation of you dropping everything and fixing my bugs ;-) I've been already very impressed with your responsiveness and mostly frustrated that I seem unable to fix this myself!
I'm trying tracing some of those trap
calls to see if I can figure out what exactly is getting caught in there.
I found the problem & it wasn't what I thought it was. It's actually the serialization library that we use, nippy, that doesn't want to deserialize the record. What makes this even weirder is that I can only reproduce the problem if it's using the nippy jar that's AOT'ed into the pigpen jar.
I'll follow up with him & see what we can come up with.
Ooooooooh so records are serializable and nippy is happily serializing it, but when it goes to deserialize it can't find the reference and it blows up?
Maybe because this is gensym
'ed so probably doesn't result in a class file?
It seems to happen for normal records too - it looks like the immediate problem is that pigpen uses AOT. When I turn AOT off, it works locally. If all else fails, I can disable AOT for pigpen; I was just using it to generate 32 nearly identical copies of a java class to work around a pig limitation.
The gensym might will be a problem down the road though as one machine will serialize the record and another will deserialize it. If the generated records will have different ids on different machines, it won't be able to deserialize the transported data. If those ids are locked in at jar compilation time (possibly via AOT), then this could work, but then we're back to the AOT problem.
Do you know if there's a way to disable record generation in prismatic's graph? Or at least have it generate stable ids?
It's possible to disable the record generation, but the cost to performance hurts (at least for us), where we're using this to process a very large stream of data.
That being said... we are AOT'ing our code before putting on the cluster so it's likely we'll be seeing this problem if we went that route anyway.
Generating stable id's is interesting, but goes beyond my understanding of clojure. The way this is used though...it would look like you could just do a hash of the map that is used to generate the record and turn that into a symbol instead of using gensym
IIRC, you said that disabling AOT resolved this, correct? Could I mark this as closed?
You may certainly mark it as closed, it is happily running now.
On Mon, May 11, 2015 at 10:39 AM, Matt Bossenbroek <notifications@github.com
wrote:
IIRC, you said that disabling AOT resolved this, correct? Could I mark
this as closed?—
Reply to this email directly or view it on GitHub
#138 (comment).