simplified index dtype as `float`?
Closed this issue · 7 comments
The index of the resultant dataframe from sgeop .simplify_network()
has an index dtype as float
.
- Does this matter?
- Should it be integer?
No idea where it comes from. There are many calls to reset_index so it is a bit surprising.
But it doesn't hurt anything for now that the index is float, correct? Or do you think it would be worth the time for me to find out exactly where this is happening and fix it?
It should not matter.
Do we want to have a final cleanse in simplify_network()
ensure an integer index or just ignore. If ignore, let's close out this ticket.
simplify_network
ends with induce_nodes
which ends with split
which ends with pd.concat(..., ignore_index=True)
, so it is beyond my understanding why is it a float.
simplify_network
ends withinduce_nodes
which ends withsplit
which ends withpd.concat(..., ignore_index=True)
, so it is beyond my understanding why is it a float.
split
ends with pd.concat(..., ignore_index=True)
if either of 2 conditions are met12 on the last element of split_points.drop_duplicates()
. That happened in all our FUA test cases, but it is not the case in our small Apalachicola, FL dataset:
In [1]: import sgeop, geopandas
In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")
In [3]: simplified = sgeop.simplify_network(original)
In [4]: simplified
Out[4]:
geometry _status
755.0 LINESTRING (5936300.114 -1526202.535, 5936277.... changed
756.0 LINESTRING (5936359.276 -1525963.175, 5936356.... changed
757.0 LINESTRING (5936383.249 -1526203.76, 5936384.5... changed
758.0 LINESTRING (5936359.276 -1525963.175, 5936443.... changed
759.0 LINESTRING (5936359.276 -1525963.175, 5936353.... changed
... ... ...
1505.0 LINESTRING (5938269.111 -1526620.16, 5938306.4... new
1506.0 LINESTRING (5938306.442 -1526579.932, 5938311.... new
1507.0 LINESTRING (5938174.872 -1526720.434, 5938182.... new
1508.0 LINESTRING (5938182.986 -1526719.871, 5938243.... new
1509.0 LINESTRING (5938269.111 -1526620.16, 5938243.2... new
[755 rows x 2 columns]
So we probably want one final .reset_index(drop=True)
following the for-loop in nodes.split()
resulting in:
In [1]: import sgeop, geopandas
In [2]: original = geopandas.read_parquet("sgeop/tests/data/apalachicola_original.parquet")
In [3]: simplified = sgeop.simplify_network(original)
In [4]: simplified
Out[4]:
geometry _status
0 LINESTRING (5936300.114 -1526202.535, 5936277.... changed
1 LINESTRING (5936359.276 -1525963.175, 5936356.... changed
2 LINESTRING (5936383.249 -1526203.76, 5936384.5... changed
3 LINESTRING (5936359.276 -1525963.175, 5936443.... changed
4 LINESTRING (5936359.276 -1525963.175, 5936353.... changed
.. ... ...
750 LINESTRING (5938269.111 -1526620.16, 5938306.4... new
751 LINESTRING (5938306.442 -1526579.932, 5938311.... new
752 LINESTRING (5938174.872 -1526720.434, 5938182.... new
753 LINESTRING (5938182.986 -1526719.871, 5938243.... new
754 LINESTRING (5938269.111 -1526620.16, 5938243.2... new
[755 rows x 2 columns]
@martinfleis Do you concur?
Footnotes
Ah, I did not consider it an option that it does not go through either of those. Yes, reset_index then.