Following are the blogs that I compiled from my learnings on Spark:
- Where does Spark fit in Hadoop ecosystem?
- How to Size Executors, Cores and Memory for a Spark application running in memory
- Deep dive into Spark Data Layout
- Evolution of Second generation Tungsten Engine
- Task Memory Management in ApacheSpark
- Spark as cloud-based SQL Engine for BigData via ThriftServer
- Building real-time interactive applications with Spark
- Spark as Knowledge Browser and the impact of DataSchema on performance
- Rebroadcasting a Broadcast Variable
- How to weave a periodically changing cached-data with your streaming application?
- Spark-Scala Setup in Jupyter
- Troubles of using filesystem (S3/HDFS) as data source in Spark