nathanmarz/cascalog

Queries with Clojure Records

kul opened this issue · 6 comments

kul commented

There seems to be a problem with queries if clojure records are present in tuples

user=> (use 'cascalog.api)
nil
user=> (defrecord MyRec [a b])
user.MyRec
user=> (??<- [?r] ([(MyRec. 1 2)] ?r))
UnsupportedOperationException   user.MyRec (form-init9201996833299850058.clj:1)
user=> (??<- [?r] ([[(MyRec. 1 2)]] ?r))
UnsupportedOperationException   user.MyRec (form-init4990014382799884351.clj:1)

hi kul,

could you add a full stracktrace? you can get this my calling (pst) in your repl directly after trying the query that fails. thanks.

kul commented
user=> (pst)
UnsupportedOperationException 
        user.MyRec (form-init4990014382799884351.clj:1)
        com.esotericsoftware.kryo.serializers.MapSerializer.read (MapSerializer.java:137)
        com.esotericsoftware.kryo.serializers.MapSerializer.read (MapSerializer.java:17)
        com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:612)
        cascading.kryo.KryoDeserializer.deserialize (KryoDeserializer.java:37)
        cascading.tuple.hadoop.TupleSerialization$SerializationElementReader.read (TupleSerialization.java:628)
        cascading.tuple.hadoop.io.HadoopTupleInputStream.readType (HadoopTupleInputStream.java:105)
        cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement (HadoopTupleInputStream.java:52)
        cascading.tuple.io.TupleInputStream.readTuple (TupleInputStream.java:78)
        cascading.tuple.io.TupleInputStream.readTuple (TupleInputStream.java:67)
        cascading.tuple.hadoop.io.TupleDeserializer.deserialize (TupleDeserializer.java:38)
        cascading.tuple.hadoop.io.TupleDeserializer.deserialize (TupleDeserializer.java:28)

Great! didnt know about pst

thanks.

I recall this now. carbonite, a library which allows for clojure types to be serialized with kryo, has a bug where records cannot be serialized. So, this is actually a bug in carbonite and not cascalog itself.

I will look into fixing the carbonite bug.

kul commented

That great news (in the sense that cascalog doesnt need to be patched)!

Thanks

@kul kryo cannot serialize clojure records in a generic manner since records are concrete types in java.

So, your options are:

  1. write kryo serializers for your record types and register them with hadoop/cascalog
  2. preprocess your records into maps using the builtin map->MyRecord fns and just pass maps around inside cascalog.

Closing this one.