Stratio/Spark-MongoDB

How to increase the performance.

swarooppallapothu opened this issue · 0 comments

I have 1 Billion rows(50GB for 1 column) in the RDBMS. I am doing analysis on that data using spark dataframes and persisting into mongo db Here output dataframe may extend 5 Times.
-> As i said 1 column has 50GB data after analysis it may extend 5*50=250 GB
After analysis i am persisting data with time of 10Hrs.

Please provide steps for better performance. I need to save that dataframe less than 1Hr.

Thank you.