jupyter-server/jupyter-scheduler

Unable to locate output files from a scheduled job run

AravindAmazon opened this issue · 5 comments

Hello team,

I have a job scheduled which will auto-run a basic Python script that writes a csv file output. The job runs successfully. However, am not able to locate the directory where the output file is scored.
The code line used for writing : temp.to_csv('trial_output.csv'), where temp is the data-frame variable.
When I use the same script in regular JupYter environment (outside JupYter lab), the csv file gets written successfully to the local JupYter environment folder. The issue appears to be happening only in the JupYter lab environment while using a scheduled job. Appreciate if someone can help (I use JupYter notebook via the AWS SageMaker interface).

Full-script:
import pandas as pd
temp = pd.read_csv("s3:///")
temp.to_csv('trial_output.csv')

Overall purpose:
Require to auro-run case predictions on a daily basis (with a volume of atleast 10,000 predictions per day) and share a daily csv with business users (without any manual intervention)

Thanks,
Aravind

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

When I try recreating this, the file gets saved to the root folder of jupyter lab which maps to the location from where I ran the jupyter-lab command.

ArchivingExecutionManager archives the output files to a .tar.gz file, but it doesn't include files created as a side effect of running the notebook, as described in this issue.

This issue might be fixed by either modifying ArchivingExecutionManager or creating an alternate execution manager that gathers all output formats and all supporting files in and under the working directory, and saves them into an archive of some kind (.zip or .tar.gz).