simplified index dtype as `float`?

Question

simplified index dtype as `float`?

Closed this issue a month ago · 7 comments

jGaboardi commented a month ago

The index of the resultant dataframe from sgeop .simplify_network() has an index dtype as float.

Does this matter?
Should it be integer?

Answer 1 · 2024-10-14T08:59:27.000Z

No idea where it comes from. There are many calls to reset_index so it is a bit surprising.

Answer 2 · 2024-10-14T12:46:54.000Z

But it doesn't hurt anything for now that the index is float, correct? Or do you think it would be worth the time for me to find out exactly where this is happening and fix it?

Answer 3 · 2024-10-14T12:50:57.000Z

It should not matter.

Answer 4 · 2024-10-15T19:53:14.000Z

Do we want to have a final cleanse in simplify_network() ensure an integer index or just ignore. If ignore, let's close out this ticket.

Answer 5 · 2024-10-15T20:04:25.000Z

simplify_network ends with induce_nodes which ends with split which ends with pd.concat(..., ignore_index=True), so it is beyond my understanding why is it a float.

Answer 6 · 2024-10-16T02:56:20.000Z

simplify_network ends with induce_nodes which ends with split which ends with pd.concat(..., ignore_index=True), so it is beyond my understanding why is it a float.

split ends with pd.concat(..., ignore_index=True) if either of 2 conditions are met¹² on the last element of split_points.drop_duplicates(). That happened in all our FUA test cases, but it is not the case in our small Apalachicola, FL dataset:

In [1]: import sgeop, geopandas

In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")

In [3]: simplified = sgeop.simplify_network(original)

In [4]: simplified
Out[4]: 
                                                 geometry  _status
755.0   LINESTRING (5936300.114 -1526202.535, 5936277....  changed
756.0   LINESTRING (5936359.276 -1525963.175, 5936356....  changed
757.0   LINESTRING (5936383.249 -1526203.76, 5936384.5...  changed
758.0   LINESTRING (5936359.276 -1525963.175, 5936443....  changed
759.0   LINESTRING (5936359.276 -1525963.175, 5936353....  changed
...                                                   ...      ...
1505.0  LINESTRING (5938269.111 -1526620.16, 5938306.4...      new
1506.0  LINESTRING (5938306.442 -1526579.932, 5938311....      new
1507.0  LINESTRING (5938174.872 -1526720.434, 5938182....      new
1508.0  LINESTRING (5938182.986 -1526719.871, 5938243....      new
1509.0  LINESTRING (5938269.111 -1526620.16, 5938243.2...      new

[755 rows x 2 columns]

So we probably want one final .reset_index(drop=True) following the for-loop in nodes.split() resulting in:

In [1]: import sgeop, geopandas

In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")

In [3]: simplified = sgeop.simplify_network(original)

In [4]: simplified
Out[4]: 
                                              geometry  _status
0    LINESTRING (5936300.114 -1526202.535, 5936277....  changed
1    LINESTRING (5936359.276 -1525963.175, 5936356....  changed
2    LINESTRING (5936383.249 -1526203.76, 5936384.5...  changed
3    LINESTRING (5936359.276 -1525963.175, 5936443....  changed
4    LINESTRING (5936359.276 -1525963.175, 5936353....  changed
..                                                 ...      ...
750  LINESTRING (5938269.111 -1526620.16, 5938306.4...      new
751  LINESTRING (5938306.442 -1526579.932, 5938311....      new
752  LINESTRING (5938174.872 -1526720.434, 5938182....      new
753  LINESTRING (5938182.986 -1526719.871, 5938243....      new
754  LINESTRING (5938269.111 -1526620.16, 5938243.2...      new

[755 rows x 2 columns]

@martinfleis Do you concur?

Answer 7 · 2024-10-16T05:34:45.000Z

Ah, I did not consider it an option that it does not go through either of those. Yes, reset_index then.

Footnotes