anhnongdan/Spark1.6_Problems

NoneType() has no len() error in opt/cloudera

Closed this issue · 2 comments

File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/serializers.py", line 422, in loads
    return pickle.loads(obj)
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/types.py", line 1159, in <lambda>
    return lambda *a: dataType.fromInternal(a)
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/types.py", line 565, in fromInternal
    values = [f.fromInternal(v) for f, v in zip(self.fields, obj)]
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/types.py", line 438, in fromInternal
    return self.dataType.fromInternal(obj)
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/types.py", line 619, in fromInternal
    return self.deserialize(self._cachedSqlType().fromInternal(obj))
  File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/mllib/linalg/__init__.py", line 166, in deserialize
    assert len(datum) == 4, \
TypeError: ("object of type 'NoneType' has no len()", <function <lambda> at 0x7fba3e237578>, (None, None))

Rerun immediately yield the same error.

This cause by calculating Euclidean distance of 2 vectors but the actually value of the columns are 0s.
=> we can see the 'assert len(datum) == 4' indicates that the none-null value of the columns has length of 4