aws/amazon-sagemaker-feedback

AWS Sagemaker Studio JupyterLab Space: Glue Pyspark and Ray Kernel Python and pip version mismatch

Opened this issue · 1 comments

Product Version

  • Amazon SageMaker Studio Classic
  • Amazon SageMaker Studio
  • Issue is not related to SageMaker Studio

Issue Description

It looks like there is a discrepancy between the Python version and the pip-installed packages in my Glue Pyspark and Ray kernel in my AWS Sagemaker Studio JupyterLab Space. I first noticed the issue when I was trying to import IPython, by which I received a ModuleNotFound Error, but if I did !pip list | grep ipython I get ipython 8.20.0. I also did !ipython --version which gives 8.20.0.

Expected Behavior

I would expect that all the packages listed by !pip list is available for import, but it is not. Installing packages with !pip install and attempting an import also does not work.

Observed Behavior

Upon further investigation, I did the following.

1. With the Glue Pyspark and Ray kernel

import sys
print(sys.path)

which gives

['/tmp', '/tmp/spark-7a070785-711d-4791-9ed3-631d12bc29a0/userFiles-298f6845-dc16-4a6c-95e2-0245bbc35529', '/opt/amazon/spark/python/lib/pyspark.zip', '/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip', '/opt/amazon/lib/python3.6/site-packages', '/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/home/spark/.local/lib/python3.7/site-packages', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']

!python --version however gives 3.10.13

I also did pandas.__version__ which gives 1.3.2, but !pip list | grep pandas gives

pandas 2.1.4 
pandas-stubs 2.1.4.231227

2. With the standard Python 3 (ipykernel)

import sys
print(sys.path)

which gives

['/home/sagemaker-user', '/opt/conda/lib/python310.zip', '/opt/conda/lib/python3.10', '/opt/conda/lib/python3.10/lib-dynload', '', '/opt/conda/lib/python3.10/site-packages']

and !python --version now gives 3.10.13, which is consistent with sys.path

I redid pandas.__version__ which now gives 2.1.4, while !pip list | grep pandas gives

pandas 2.1.4
pandas-stubs 2.1.4.231227

which is consistent.

3. Conclusion

It therefore seems I have an issue with the Glue Pyspark and Ray kernel where the Python version in the kernel is pointing to some other installation than the one recognized by pip, and therefore many of the pip-installed packages are not found.

I did find a similar question posted here Conflicting Python versions in SageMaker Studio notebook with Python 3.8 kernel but the accepted answer isn't really helping me.

Any assistance is greatly appreciated. Am I missing something simple here or has anyone else come across such an issue with the Glue Pyspark and Ray kernel?

Product Category

JupyterLab

Feedback Category

Configuration and Setup

Other Details

Original question posted here, also by me.

https://stackoverflow.com/questions/78087104/aws-sagemaker-studio-jupyterlab-space-glue-pyspark-and-ray-kernel-python-and-pi

Hi @emile-rc, thanks for raising this issue. I've let the team know about the issue and will update the issue here when we have a fix.