uscuni/sgeop

simplified index dtype as `float`?

Closed this issue · 7 comments

The index of the resultant dataframe from sgeop .simplify_network() has an index dtype as float.

  • Does this matter?
  • Should it be integer?

No idea where it comes from. There are many calls to reset_index so it is a bit surprising.

But it doesn't hurt anything for now that the index is float, correct? Or do you think it would be worth the time for me to find out exactly where this is happening and fix it?

It should not matter.

Do we want to have a final cleanse in simplify_network() ensure an integer index or just ignore. If ignore, let's close out this ticket.

simplify_network ends with induce_nodes which ends with split which ends with pd.concat(..., ignore_index=True), so it is beyond my understanding why is it a float.

simplify_network ends with induce_nodes which ends with split which ends with pd.concat(..., ignore_index=True), so it is beyond my understanding why is it a float.

split ends with pd.concat(..., ignore_index=True) if either of 2 conditions are met12 on the last element of split_points.drop_duplicates(). That happened in all our FUA test cases, but it is not the case in our small Apalachicola, FL dataset:

In [1]: import sgeop, geopandas

In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")

In [3]: simplified = sgeop.simplify_network(original)

In [4]: simplified
Out[4]: 
                                                 geometry  _status
755.0   LINESTRING (5936300.114 -1526202.535, 5936277....  changed
756.0   LINESTRING (5936359.276 -1525963.175, 5936356....  changed
757.0   LINESTRING (5936383.249 -1526203.76, 5936384.5...  changed
758.0   LINESTRING (5936359.276 -1525963.175, 5936443....  changed
759.0   LINESTRING (5936359.276 -1525963.175, 5936353....  changed
...                                                   ...      ...
1505.0  LINESTRING (5938269.111 -1526620.16, 5938306.4...      new
1506.0  LINESTRING (5938306.442 -1526579.932, 5938311....      new
1507.0  LINESTRING (5938174.872 -1526720.434, 5938182....      new
1508.0  LINESTRING (5938182.986 -1526719.871, 5938243....      new
1509.0  LINESTRING (5938269.111 -1526620.16, 5938243.2...      new

[755 rows x 2 columns]

So we probably want one final .reset_index(drop=True) following the for-loop in nodes.split() resulting in:

In [1]: import sgeop, geopandas

In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")

In [3]: simplified = sgeop.simplify_network(original)

In [4]: simplified
Out[4]: 
                                              geometry  _status
0    LINESTRING (5936300.114 -1526202.535, 5936277....  changed
1    LINESTRING (5936359.276 -1525963.175, 5936356....  changed
2    LINESTRING (5936383.249 -1526203.76, 5936384.5...  changed
3    LINESTRING (5936359.276 -1525963.175, 5936443....  changed
4    LINESTRING (5936359.276 -1525963.175, 5936353....  changed
..                                                 ...      ...
750  LINESTRING (5938269.111 -1526620.16, 5938306.4...      new
751  LINESTRING (5938306.442 -1526579.932, 5938311....      new
752  LINESTRING (5938174.872 -1526720.434, 5938182....      new
753  LINESTRING (5938182.986 -1526719.871, 5938243....      new
754  LINESTRING (5938269.111 -1526620.16, 5938243.2...      new

[755 rows x 2 columns]

@martinfleis Do you concur?


Footnotes

  1. https://github.com/uscuni/sgeop/blob/abbd153d810f4d729a6c81548cbcd921d67dae3d/sgeop/nodes.py#L20

  2. https://github.com/uscuni/sgeop/blob/abbd153d810f4d729a6c81548cbcd921d67dae3d/sgeop/nodes.py#L41

Ah, I did not consider it an option that it does not go through either of those. Yes, reset_index then.