[GENERAL SUPPORT]: Error in Running Early Stopping Program from Ax Documentation
vinaysaini94 opened this issue · 2 comments
vinaysaini94 commented
Question
Error in Running Early Stopping Program from Ax Documentation
Please provide any relevant code snippet if applicable.
Hi Community,
I am having trouble running the early stopping tutorial from this link "https://ax.dev/tutorials/early_stopping/early_stopping.html". Specifically, I am encountering an error when executing the following code line:
"%%time
scheduler.run_all_trials()"
The error message I get is:
"{
"name": "OSError",
"message": "[WinError 87] The parameter is incorrect",
"stack": "---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File <timed eval>:1
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\service\\scheduler.py:1124, in Scheduler.run_all_trials(self, timeout_hours, idle_callback)
1117 if self.options.total_trials is None:
1118 # NOTE: Capping on number of trials will likely be needed as fallback
1119 # for most stopping criteria, so we ensure `num_trials` is specified.
1120 raise ValueError(
1121 \"Please either specify `num_trials` in `SchedulerOptions` input \"
1122 \"to the `Scheduler` or use `run_n_trials` instead of `run_all_trials`.\"
1123 )
-> 1124 return self.run_n_trials(
1125 max_trials=not_none(self.options.total_trials),
1126 timeout_hours=timeout_hours,
1127 idle_callback=idle_callback,
1128 )
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\service\\scheduler.py:1071, in Scheduler.run_n_trials(self, max_trials, ignore_global_stopping_strategy, timeout_hours, idle_callback)
1036 \"\"\"Run up to ``max_trials`` trials; will run all ``max_trials`` unless
1037 completion criterion is reached. For base ``Scheduler``, completion criterion
1038 is reaching total number of trials set in ``SchedulerOptions``, so if that
(...)
1068 3
1069 \"\"\"
1070 self.poll_and_process_results()
-> 1071 for _ in self.run_trials_and_yield_results(
1072 max_trials=max_trials,
1073 ignore_global_stopping_strategy=ignore_global_stopping_strategy,
1074 timeout_hours=timeout_hours,
1075 idle_callback=idle_callback,
1076 ):
1077 pass
1078 return self.summarize_final_result()
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\service\\scheduler.py:964, in Scheduler.run_trials_and_yield_results(self, max_trials, ignore_global_stopping_strategy, timeout_hours, idle_callback)
958 # Run new trial evaluations until `run` returns `False`, which
959 # means that there was a reason not to run more evaluations yet.
960 # Also check that `max_trials` is not reached to not exceed it.
961 n_remaining_to_generate = self._num_remaining_requested_trials - len(
962 self.candidate_trials
963 )
--> 964 while self._num_remaining_requested_trials > 0 and self.run(
965 max_new_trials=n_remaining_to_generate
966 ):
967 # Not checking `should_abort_optimization` on every trial for perf.
968 # reasons.
969 n_already_run_by_scheduler = (
970 len(self.experiment.trials)
971 - n_existing
972 - len(self.candidate_trials)
973 )
974 self._num_remaining_requested_trials = (
975 max_trials - n_already_run_by_scheduler
976 )
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\service\\scheduler.py:1192, in Scheduler.run(self, max_new_trials)
1190 self.logger.info(f\"Running trials {idcs_str}...\")
1191 # TODO: Add optional timeout between retries of `run_trial(s)`.
-> 1192 metadata = self.run_trials(trials=all_trials)
1193 self.logger.debug(f\"Ran trials {idcs_str}.\")
1194 if self.options.debug_log_run_metadata:
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\utils\\common\\executils.py:163, in retry_on_exception.<locals>.func_wrapper.<locals>.actual_wrapper(*args, **kwargs)
159 wait_interval = min(
160 MAX_WAIT_SECONDS, initial_wait_seconds * 2 ** (i - 1)
161 )
162 time.sleep(wait_interval)
--> 163 return func(*args, **kwargs)
165 # If we are here, it means the retries were finished but
166 # The error was suppressed. Hence return the default value provided.
167 return default_return_on_suppression
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\service\\scheduler.py:639, in Scheduler.run_trials(self, trials)
617 @retry_on_exception(retries=3, no_retry_on_exception_types=NO_RETRY_EXCEPTIONS)
618 def run_trials(self, trials: Iterable[BaseTrial]) -> Dict[int, Dict[str, Any]]:
619 \"\"\"Deployment function, runs a single evaluation for each of the
620 given trials.
621
(...)
637 process.
638 \"\"\"
--> 639 return self.runner.run_multiple(trials=trials)
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\core\\runner.py:70, in Runner.run_multiple(self, trials)
50 def run_multiple(
51 self, trials: Iterable[core.base_trial.BaseTrial]
52 ) -> Dict[int, Dict[str, Any]]:
53 \"\"\"Runs a single evaluation for each of the given trials. Useful when deploying
54 multiple trials at once is more efficient than deploying them one-by-one.
55 Used in Ax ``Scheduler``.
(...)
68 process.
69 \"\"\"
---> 70 return {trial.index: self.run(trial=trial) for trial in trials}
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\ax\\runners\\torchx.py:159, in TorchXRunner.run(self, trial)
156 parameters[\"tracker_base\"] = self._tracker_base
158 appdef = self._component(**parameters)
--> 159 app_handle = self._torchx_runner.run(appdef, self._scheduler, self._cfg)
160 return {
161 TORCHX_APP_HANDLE: app_handle,
162 TORCHX_RUNNER: self._torchx_runner,
163 TORCHX_TRACKER_BASE: self._tracker_base,
164 }
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\runner\\api.py:262, in Runner.run(self, app, scheduler, cfg, workspace, parent_run_id)
252 with log_event(
253 api=\"run\", runcfg=json.dumps(cfg) if cfg else None, workspace=workspace
254 ) as ctx:
255 dryrun_info = self.dryrun(
256 app,
257 scheduler,
(...)
260 parent_run_id=parent_run_id,
261 )
--> 262 handle = self.schedule(dryrun_info)
263 ctx._torchx_event.scheduler = none_throws(dryrun_info._scheduler)
264 ctx._torchx_event.app_image = none_throws(dryrun_info._app).roles[0].image
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\runner\\api.py:308, in Runner.schedule(self, dryrun_info)
301 with log_event(
302 \"schedule\",
303 scheduler,
304 app_image=app_image,
305 runcfg=json.dumps(cfg) if cfg else None,
306 ) as ctx:
307 sched = self._scheduler(scheduler)
--> 308 app_id = sched.schedule(dryrun_info)
309 app_handle = make_app_handle(scheduler, self._name, app_id)
310 app = none_throws(dryrun_info._app)
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\schedulers\\local_scheduler.py:805, in LocalScheduler.schedule(self, dryrun_info)
802 replica_log_dir = role_log_dirs[replica_id]
804 os.makedirs(replica_log_dir)
--> 805 replica = self._popen(
806 role_name,
807 replica_id,
808 replica_params,
809 )
810 local_app.add_replica(role_name, replica)
811 self._apps[app_id] = local_app
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\schedulers\\local_scheduler.py:693, in LocalScheduler._popen(self, role_name, replica_id, replica_params)
682 def _popen(
683 self,
684 role_name: RoleName,
685 replica_id: int,
686 replica_params: ReplicaParam,
687 ) -> _LocalReplica:
688 \"\"\"
689 Same as ``subprocess.Popen(**popen_kwargs)`` but is able to take ``stdout`` and ``stderr``
690 as file name ``str`` rather than a file-like obj.
691 \"\"\"
--> 693 stdout_, stderr_, combined_ = self._get_replica_output_handles(replica_params)
695 args_pfmt = pprint.pformat(asdict(replica_params), indent=2, width=80)
696 log.debug(f\"Running {role_name} (replica {replica_id}):\
{args_pfmt}\")
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\schedulers\\local_scheduler.py:731, in LocalScheduler._get_replica_output_handles(self, replica_params)
729 combined_file = self._get_file_io(replica_params.combined)
730 if combined_file:
--> 731 combined_ = Tee(
732 combined_file,
733 none_throws(replica_params.stdout),
734 none_throws(replica_params.stderr),
735 )
736 return stdout_, stderr_, combined_
File c:\\Users\\Vinay Saini\\anaconda3\\Lib\\site-packages\\torchx\\schedulers\\streams.py:35, in Tee.__init__(self, out, *sources)
33 for source in sources:
34 r = io.open(source, \"rb\", buffering=0)
---> 35 os.set_blocking(r.fileno(), False)
36 self.streams.append(r)
38 self._closed = False
OSError: [WinError 87] The parameter is incorrect"
}"
I am running this code on a Windows 11 system using Jupyter Notebook.
Any advice on how to resolve this issue or modify the code to work on my setup would be greatly appreciated!
Thanks in advance for your help!
Code of Conduct
- I agree to follow this Ax's Code of Conduct
Balandat commented
Hmm interesting - this looks to be very deep down in the torchx stack. The fact that it's an OSError
makes me think that this is not really an Ax issue an more an issue with torchx on windows. There are a few similar issues out there:
- https://discuss.pytorch.org/t/oserror-when-importing-torch-in-python-script/203520 (recommends downgrading python)
- coherent-oss/coherent.deps#1
- https://www.reddit.com/r/learnprogramming/comments/zvdvl1/new_to_using_kivy_keep_getting_error/ (suggests might be permissions related).
bernardbeckerman commented
Closing this out since it's been a while but please feel free to reopen if you still need assistance @vinaysaini94!