Starting job with trigger causes error
tardif54 opened this issue · 5 comments
I created a Job that uses 2 --extra-py-files, one of them is a library archived in a zip file, following AWS guidelines.
When the job is started through the AWS Glue console, everything works fine. Whenever I use a trigger or a command line (start-job-run() ) to start the exact same job, I get the following error :
Resource Setup Error: Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR s3://bucket/path/to/my/zip/file.zip with URI s3. Please specify a class through --class.
I have tried using non-overridable parameters, specifying the extra-py-files in my command lines, nothing seems to work.
from the error it looks like your files are being supplied in --extra-jars, from command line you would need to make use of --extra-py-files like below.
aws glue start-job-run --job-name "mysql-rds-parallel-read" --arguments='--scriptLocation="s3://my_glue/libraries/test_lib.py",--extra-py-files="hive_metastore_migration.py"'
it will return you jobrun id if submitted successfully like below,
{
"JobRunId": "jr_8313019d2c9e3db824d4681d6f1e43a2e54ed35707fca5bf2e6c8c764719448b"
}
from the error it looks like your files are being supplied in --extra-jars, from command line you would need to make use of --extra-py-files like below.
aws glue start-job-run --job-name "mysql-rds-parallel-read" --arguments='--scriptLocation="s3://my_glue/libraries/test_lib.py",--extra-py-files="hive_metastore_migration.py"'
it will return you jobrun id if submitted successfully like below,
{
"JobRunId": "jr_8313019d2c9e3db824d4681d6f1e43a2e54ed35707fca5bf2e6c8c764719448b"
}
I have tried explicitly adding --extra py files to my AWS Cli command. Here's a part of the log for a failed job run, you can see that both files are there in --extra-py-files. I dont understand what is the difference between starting from the console and starting with a trigger or command line
--extra-py-files s3://bucket/path/to/connection.py, s3://bucket/path/to/optical_services_results/osr_transformations.zip --JOB_ID j_12cf0bc0f9428c8c6a83ed8830575cdf3dc47da498a22a35a3ba1b822b27ff6d --JOB_RUN_ID jr_6ab99c591ef78af230da38aea7f2274805fcba58ce0b3f9cd2d5f25535aa3786 --enable-glue-datacatalog --job-bookmark-option job-bookmark-disable --scriptLocation s3://bucket/path/to/optical_services_results/optical_services_results.py --job-language python --TempDir s3://bucket/path/to/temp/ --JOB_NAME optical-services-results-job
while researching i found discussion on below aws developer forums
https://forums.aws.amazon.com/thread.jspa?threadID=308042
while researching i found discussion on below aws developer forums
https://forums.aws.amazon.com/thread.jspa?threadID=308042
I have tried explicitly supplying job arguments.... doesn't work
Here's the solution to my problem, I used a Boto3 script :
import boto3
client = boto3.client('glue')
def add_trigger():
client.create_trigger(
Name='test2schedule',
Type='SCHEDULED',
Schedule='cron(07 19 * * ? *)',
Actions=[
{
'JobName': 'optical-services-results-job',
'Arguments': {
'--scriptLocation': 's3://my/bucket/script.py',
'--extra-py-files': 's3://my/bucket/connection.py,s3://my/bucket/pythonlibrary.zip'
},
},
],
StartOnCreation=True
)
def main():
add_trigger()
if __name__ == "__main__":
main()