Spark version: 3.3.0-SNAPSHOT
This is about "How to set up Apache Spark open-source code in Eclipse IDE on your local windows machine."
PFB steps:
- checkout this repo on the local machine
- open scala eclipse ide, with workspace set till this folder /apache-spark-local-setup
- import the spark code using "Existing Maven Projects" from the "import" wizard
- let it configure the codebase and "building workspace" progress bar to complete
- once it is done, select all the projects and do a "maven update project"
- select the parent project, right click and do run as "Maven build", use skipTests=true (to avoid running test cases) as well as -Dmaven.skip.tests=true (to avoid compiling test cases) in order to build faster
- select all the project and refresh it,
- go to this path: \apache-spark-local-setup\spark\sql\core\src\main\scala\org\apache\spark\sql\test\TestBatchSetup.scala \apache-spark-local-setup\spark\sql\core\src\main\scala\org\apache\spark\sql\test\TestMe.scala
The above steps have been tested on Windows Machine ( v8.1 and v10)
scala-ide (latest):
java: basically java8 (tested on jdk1.8.0_301 or jdk1.8.0_121 or jdk1.8.0_201)
maven: 3.6.3 or later version (I've tested specifically on 3.6.3)
(notice: I've used G1 as Garbage Collector and a max heap size of 2.5 GB though, it will work even on 2 GB
building spark code is more smooth with G1 than other Garbage Collectors (like CMS),
it consumes less heap as well, mostly aorund 1 GB heap, sometimes it do touch around 1.9 GB or 2.1 GB
Possible Challanges (while building):
in case of StackOverflowError, use -xss 512kb in the above -vmargs as part of eclipse.ini file
this -xss option can be tweaked accordingly (slowly starting from 256 KB to 512 kb to 768 kb etc.)
Once the codebase is all set then take a walk of the codebase, start debugging the source code, make changes in the source code for deeper understanding of the spark flow etc. now spark is all yours : D
I'm sure you'll enjoy like I did : p : D
Happy Learning !!