MrPowers/spark-daria

.Net for Spark

Closed this issue · 3 comments

Is anyone working on corresponding features for c#.Net ? I am migrating some of my code to .Net, and I don't have a direct substitute for the features that I was using from this project. Any information would be appreciated.

@dbeavon - thanks for creating the issue. I didn't even know that Microsoft has a project to bring Spark language bindings to .NET. I don't know anything about the .NET Spark library ecosystem, so don't have any valuable info for you unfortunately.

@MrPowers One of the main challenges on the .Net side of things is that you don't find any results when you google, there aren't many samples, and there aren't many utilities or libraries yet to help people get started.

For about a year I've been developing code in scala/spark, and relying on community projects in that ecosystem. But when I'm migrating any of my work to the .Net side of spark, then I'm discovering that I have to write custom code for some very basic purposes (eg. dataframe validation and what-not). There aren't a lot of community projects that offer utilities or extension API's.

The first thing I tried was a google search for .net and daria, but turned up nothing.

Would it be a problem if I someone were to create a project that reused "daria" in its name, for the sake of google searches (maybe "daria-spark-net" or whatever). It would be very helpful for anyone that had to migrate code from scala when it was built with some of the utilities in this github project.

FYI, In terms of the technical architecture, .Net works like python. It relies on the same integration features that python uses. .Net core is hosted out-of-process in a way that behaves like a side-car for a spark worker node. Just like python it uses apache arrow. Despite the comparisons with python, the c# language itself is very similar to scala and I think a good c# developer will feel relatively comfortable in scala and vice-versa. .... All this is to say that for .Net developers it will probably be more "natural" to use some equivalent of a scala API, and not some equivalent of a python library.

Please let me know if you have any strong opinions about potentially creating a .Net variation of "daria".

@dbeavon - I support building a similar project for .net users, but I'd rather a different name gets used if that's alright. I used quinn for the PySpark version of this lib. I'm sure these helper functions would be useful for the .net community. Thanks for checking and hope you're finding this lib useful!