anhnongdan/Spark1.6_Problems

All problems, errors when working with Spark 1.6

Issues

[Data Monitoring] 'Series' object has no attribute '_data'
#50 opened 3 years ago
1
Column declare type does not match the schema found in file metadata
#49 opened 5 years ago
0
Spark and Pandas dataframe converting
#48 opened 5 years ago
0
Problem writing parquet file: ava.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainFloatDictionary
#47 opened 5 years ago
3
[pandas] "Cannot concatenate 'str' and 'float' objects" errors when plot string type x-axis
#46 opened 5 years ago
0
How to prevent long Spark jobs in sparklyr from being aborted?
#45 opened 6 years ago
0
Spark error when reading DB: No suitable driver
#44 opened 6 years ago
0
getPredictionScore_modelSKlearn: Input contains NaN, infinity or a value too large for dtype('float32')
#43 opened 6 years ago
0
About join/aggregation on full population data (call graphs, imeis, etc...)
#42 opened 4 years ago
11
Handle 3B rows tables with aggregation and join
#41 opened 5 years ago
3
[MINEPY] driver hang when calculating score
#40 opened 6 years ago
0
Cache() when join 2 table
#39 opened 3 years ago
2
Coalesce problem when filter big parquet.
#38 opened 6 years ago
1
How to read big chunk of Parquet
#37 opened 6 years ago
2
Using iterator when joining HUGGGEEE tables
#36 opened 6 years ago
3
Join Small df and big parquet chunk - contact_rde weekly Fraud analysis
#35 opened 6 years ago
3
Coalesce big table before - AND AFTER - join
#34 opened 6 years ago
7
[HDFS] Duplicated copy command caused duplicated partitions
#33 opened 7 years ago
0
[Practice] Improve performance of daily IMEI calculation
#32 opened 7 years ago
12
Found array with 0 feature(s) - RandomForest ScikitLearn
#31 opened 7 years ago
2
KeyError: 'SPARK_HOME' when calling udf in file
#30 opened 7 years ago
3
Spark UnionAll behavior
#29 opened 7 years ago
1
No module '_name_' found in Spark Context when writing file
#28 opened 7 years ago
1
Bigint problem in Cloudera Kernel
#27 opened 7 years ago
0
NoneType() has no len() error in opt/cloudera
#26 opened 7 years ago
2
RandomForestClassifier weird unexpected argument error
#25 opened 6 years ago
0
Future time out Error when reading big files.
#24 opened 6 years ago
1
Error when compare 2 columns
#23 opened 6 years ago
2
Problem Writing (and Reading??) Big parquet file
#22 opened 7 years ago
2
Join and Union Big Tables
#21 opened 7 years ago
3
Understand Spark Physical Plan
#20 opened 7 years ago
2
PickleException
#19 opened 7 years ago
2
Benchmarking Kmeans Clustering - pyspark tiny/small
#18 opened 5 years ago
4
Error when getting distinct list
#17 opened 7 years ago
0
Memory limits exceeded
#16 opened 7 years ago
3
Use coalesce instead of repartition
#15 opened 7 years ago
5
Spark and Python tricks
#14 opened 3 years ago
0
Snowball sampling
#13 opened 7 years ago
1
pyspark-tiny vs pyspark-small kernel
#12 opened 7 years ago
2
Repartition with and without caching
#11 opened 5 years ago
6
Column object is not callable
#10 opened 7 years ago
1
Rerun script after long time disconnect
#9 opened 7 years ago
2
The beauty of repartition and cache
#8 opened 7 years ago
5
Out of memory when show table
#7 opened 6 years ago
4
Spark driver can't find shuffle index file
#6 opened 7 years ago
0
Working with long process
#5 opened 7 years ago
1
cache vs not cache when union tables
#4 opened 6 years ago
9
Performance of pySpark-tiny kernel
#3 opened 7 years ago
1
Kernel can't start
#2 opened 7 years ago
2
[Chicken] Can't write parquet file when joining large dataframe
#1 opened 7 years ago
12