cognitedata/cdp-spark-datasource

Create a complete introductory example

halvard-cognite opened this issue · 4 comments

The current documentation takes a lot of knowledge for granted.

I would love it if this repo had a complete example with instructions that I could compile and run somewhere (dataproc for example).

Might build on/copy this:
https://cloud.google.com/dataproc/docs/tutorials/spark-scala

Thanks for the feedback! You're right, the tutorials are narrowed down to using the data source from an already set up Spark cluster with the library available etc.

Just so I understand correctly - you're requesting a more thorough step-by-step guide for building and deploying Spark with the datasource available, or does this apply to the read/write examples as well?

The read/write examples are probably fine.

I'm guessing I'm not going to be the last person with no experience using Spark and Scala that will show up at this repo and want to test something out with data from CDF.

wjoel commented

As in a tutorial more about setting things up (installing the data source in Dataproc or other clusters) rather than the usage of it as explained in https://github.com/cognitedata/notebook-examples/blob/master/spark/tutorials/Cognite%20Spark%20data%20source%20tutorial.ipynb ?

To be fair I did not find the tutorials until after posting this issue and talking to Emil.
But yes, for me getting a minimal code sample running was the big challenge.
Dependency mgmt in the ecosystem etc.