sdv-dev/SDGym

Add run_on_ec2 flag to benchmark_single_table

amontanez24 opened this issue · 0 comments

Problem Description

As a user, sometimes the benchmark function take a while and a lot of compute resources so it would be nice to be able to run it on a separate instance.

Expected behavior

  • Add run_on_ec2 boolean parameter to benchmark_single_table
  • If it is True
    1. Launch an ec2 instance
    2. Install sdgym on that instance
    3. Run the job on that instance with the rest of the parameters
    4. Store the output in an S3 folder based on the value of output_filepath

Technical Details

  • output_filepath is required to be an s3 bucket if the flag is enabled. We should add a check for that
  • We should do this using boto3 directly. They have a function called run_instances that can take in a script. In our case, we just want the script to pip install sdgym and then run the cli with the commands provided in the method above