Hidden requirement for geopandas in apache-sedona[spark] 1.5.2
joonaspessi opened this issue · 4 comments
Expected behavior
Installing Sedona for pyspark with package apache-sedona[spark]
we expect that all package dependencies are installed correctly and geopandas is not needed when not using kepler or pydeck.
Actual behavior
After installing apache-sedona[spark]
and trying to import from sedona.spark import *
we see failure ModuleNotFoundError: No module named 'geopandas'
$ pip install "apache-sedona[spark]"==1.5.2
$ python
Python 3.8.18 (default, Feb 13 2024, 15:47:05)
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sedona.spark import *
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/spark/__init__.py", line 44, in <module>
from sedona.maps.SedonaKepler import SedonaKepler
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaKepler.py", line 18, in <module>
from sedona.maps.SedonaMapUtils import SedonaMapUtils
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaMapUtils.py", line 19, in <module>
import geopandas as gpd
ModuleNotFoundError: No module named 'geopandas'
Steps to reproduce the problem
Create clean python environment and run commands:
$ pip install "apache-sedona[spark]"==1.5.2
$ python
Python 3.8.18 (default, Feb 13 2024, 15:47:05)
[Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sedona.spark import *
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/spark/__init__.py", line 44, in <module>
from sedona.maps.SedonaKepler import SedonaKepler
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaKepler.py", line 18, in <module>
from sedona.maps.SedonaMapUtils import SedonaMapUtils
File "/python/versions/3.8.18/lib/python3.8/site-packages/sedona/maps/SedonaMapUtils.py", line 19, in <module>
import geopandas as gpd
ModuleNotFoundError: No module named 'geopandas'
Settings
Sedona version = 1.5.2
Apache Spark version = 3.5.1
Apache Flink version = ?
API type = Python
Scala version = N/A
JRE version = 1.8
Python version = 3.8
Environment = Standalone
@joonaspessi Sorry, this PR accidentally introduces this issue: #1229
To bypass this problem, instead of use from sedona.spark import *
, please use from sedona.spark.SedonaContext import SedonaContext
Hello, thanks for the fast response!
I think that the python will load the __init__.py
file for the sedona.spark
module even when importing sub file from the given module sedona.spark
.
This is very sad. We will make a follow up release to fix this bug.
We have released 1.5.3 to fix this bug! @joonaspessi