NetTopologySuite/NetTopologySuite.IO.Esri

Repository structure

Closed this issue · 8 comments

Before I get started I would like clarify proposed repository structure.

  1. My NetTopologySuite/NetTopologySuite.IO.ShapeFile#69 pull request introduced two new VS projects: NetTopologySuite.IO.Esri.Core and NetTopologySuite.IO.Esri. The former provides shapefile readers and writers as vanilla .NET Standard 2.0 library without any dependencies. The latter is a wrapper around Core project which gives NTS geometries instead of bunch of collections of coordinates. This separation causes some issues. For example coordinates are first loaded into plain Array and then copied to NTS' CoordinateSequence. It would be more efficient to load them directly from binary into CoordinateSequence. Furthermore the Core project doesn't contain any geometry logic (line length, polygon area, ...). Because of that some issues will be very hard to fix directly in Core project. For example fixing NetTopologySuite/NetTopologySuite.IO.ShapeFile#70 will be much easier having NTS' LinearRing.IsCCW property. For that reasons I would like to combine those two projects into one project which is tightly integrated with NTS library.
  2. We can consider moving dBASE part into separate project without NTS dependencies. After doing so one would be able to read/write dBASE tables without referencing NTS dependencies. As my implementation uses "Esri flavored" dBASE format it fits to this repository very well.
  3. I'm not a fan of maintaining NetTopologySuite.IO.GDB anymore
  4. Project naming convention - shouldn't the project names correspond strictly to NetTopologySuite.IO.Esri?
    • NetTopologySuite.IO.Esri.Shapefile
    • NetTopologySuite.IO.Esri.GDB
    • [NetTopologySuite.IO.Esri.Dbase]
    • [NetTopologySuite.IO.Esri.EnterpriseGeodatabase]

What do you think, @FObermaier?

For example coordinates are first loaded into plain Array and then copied to NTS' CoordinateSequence. It would be more efficient to load them directly from binary into CoordinateSequence.

If this were the only concern, then perhaps you could give RawCoordinateSequenceFactory a try? CommunityToolkit.HighPerformance includes some helpers that should let you reinterpret any Memory<byte> as a Memory<double>, then you could create a CoordinateSequence wrapping it.

I expect a little bit of a performance penalty compared to some other sequence types when the Geometry is actually used, but it may well be worth it to avoid full copies...

Thank you @airbreather for your comment. Storing coordinates in efficient way is a big concern for me to. I'm not familiar with all different types of geometry factories, so I was going to require GeometryFactory in the constructor and use NtsGeometryServices.Instance.CreateGeometryFactory() if null provided.

public ShpStreamBase(ShapeType shapeType, GeometryFactory factory)
{
    ...  
    Factory = factory ?? NtsGeometryServices.Instance.CreateGeometryFactory();
    ...  
}

internal CoordinateSequence GetCoordinateSequence(int size)
{
    return Factory.CoordinateSequenceFactory.Create(...);
}

You mean I should always use RawCoordinateSequenceFactory? In my project I have referenced NetTopologySuite 2.0.0 and RawCoordinateSequenceFactory isn't visible within NetTopologySuite.Geometries.Implementation namespace. Should I upgrade NTS version? To which one?

A few notes:

  • I have no issue with merging NetTopologySuite.IO.Esri.Core in NetTopologySuite.IO.Esri.Shapefile
  • RawCoordinateSequence[Factory] comes with NTS v2.3 (NetTopologySuite/NetTopologySuite#482). The flatgeobuf c# implementation uses its own, specialized CoordinateSequence/CoordinateSequenceFactory pair. I'm not sure if that is an option here, too.
  • I support the naming convention proposed.
  • If it is not clear which kind of GDB is format is meant, we should not add it to this new project.
    If there are complaints, we can add it afterwards.

I'm not sure (and probably any new consumer of the library won't be) which GDB is actually implemented

Looking at the docs it "Writes features as ESRI GeoDatabase binary format in a SqlServer database", looking at the code it simply extends snts.io.shp ShapeReader/Writer and adds almost nothing.

So I can assume that ESRI binary format is exactly the shapefile geometry format, stored in a sql column. +1 to leave this code in the deprecated library, at least for now.

@KubaSzostak if you are already working on this topic I encourage you to let us take part in the process.

I have started working on merging and cleaning up the projects at the weekend. I will push the code within the next few days.