[Researcher] Developing a new AutoML system around GAMA as outer wrapper?
simonprovost opened this issue ยท 6 comments
Dear Authors,
Excellent initiative, GAMA! Here is nonetheless my question, would it be sufficient enough to use GAMA as to design a new AutoML tool (as researcher in academia) with its own customised search space, own customised search engine, and some rules for how the search space is modified based on some action the search optimiser will take along the process (so what freedom degree do we have with GAMA overall to create new AutoML system)?
This involves utilising GAMA as a wrapper for everything required when constructing AutoML, would that be correct? Re-using what has already been done which is amazing from a skim read of the documentation and releases, but also the creation of a new AutoML task/search-space/search engine, etc, would be possible??
As a result, assuming this is the correct path, where should we begin? ๐
Best wishes,
Thank you :) I'll try to address the questions in order.
Customizing the search space is relatively easy, one example of this can be found in the clustering branch. It mostly comprises creating a new search space file. Depending on the application or algorithms in the search space, you might also need to add some metric support or changes to the evaluation.
Assuming that with search engine you mean the algorithm selection and hyperparameter optimisation procedure (e.g., evolutionary computation, successive halving), this is supported through subclassing BaseSearch. An example implementation for random search can be found here.
Adjusting the search space during optimization is not used in any current implementation, but I imagine you could just have your custom search algorithm take the search space directly as input on construction similar to how ASHA gets some additional hyperparameters, and then change the search space during search.
@prabhant from your experience integrating different libraries with GAMA, is there anything missing?
Hello @PGijsbers,
It is a delight to meet you :) I had the opportunity to read a piece of your dissertation, and it was fantastic!!
First, I would want to thank you for your prompt response; it is very appreciated! In addition, I have a few more questions, and I am sorry because I would want to avoid bothering your schedule with such non-issues questions, but I feel we are on the right route to resolve the ``issue'' in the next coming responses ๐
Customizing the search space is relatively easy, one example of this can be found in the clustering branch. It mostly comprises creating a new search space file. Depending on the application or algorithms in the search space, you might also need to add some metric support or changes to the evaluation.
This is better than I expected, this will fit perfectly with our aims.
Assuming that with search engine you mean the algorithm selection and hyperparameter optimisation procedure (e.g., evolutionary computation, successive halving), this is supported through subclassing BaseSearch. An example implementation for random search can be found here.
Apologies for not being specific earlier, but this is indeed precisely what I meant by algorithm selection and hyperparameter optimisation technique. In any event, it is great that we can use the existing options, which we intend to do, but we would also like to incorporate e.g. a Bayesian optimisation approach for a thorough exploration of various search engines, given our task, and to avoid being unfairly rejected from a conference paper review round for not exploring this sufficiently. Thus, can you ensure that this section of the documentation is a suitable starting point for introducing a new search process in addition to the existing ones?
Adjusting the search space during optimization is not used in any current implementation, but I imagine you could just have your custom search algorithm take the search space directly as input on construction similar to how ASHA gets some additional hyperparameters, and then change the search space during search.
Considering the alteration of the search space content during the search procedure, no present implementation uses this strategy as you mentioned, which actually makes sense thought. Yet, I feel it could be done straightforwardly if custom search space files could be accepted as input of the search procedure and utilise them in accordance with the rules specified in this custom search procedure. As a result, this would necessitate the creation of a couple number of search space files, but I do not believe this would be a significant burden in my use-case as long as it works. However, as this would most probably not improve the overall base GAMA implementation, I reckon it to anyway be better to keep this in a fork project and not propose it for the GAMA standard implementation. Do you broadly concur with these thoughts?
In the meantime, if I decide to utilise GAMA for my potential new Auto-ML variant, I already have numerous enhancements in mind to benefit the community you created around GAMA and the future newcomers ๐ To simply provide a bit more context, as part of my Ph.D journey that just started, one of my goal is to contribute to the community's development and establish connections for future postdoctoral or research positions. Consequently, using GAMA would be advantageous for both my potential Ph.D subject but also to contribute to the open-source's community of GAMA, which I would very appreciate.
Have a great end of the day and thanks for your time :)
Cheers,
Happy to hear about your colleague @prabhant whenever he has spare time to provide additional comments, if necessary ๐
Pieter covered most of the stuff required. It depends a lot on if the library or group of estimators you are automating is sklearn compatible or not. If it is not scikit friendly library then take a look at the implementation of the OAML branch, Its a little more work in that scenario.
Let me know if you have any other questions.
Hi @prabhant
I am extremely grateful for your help too ๐ I will investigate the OAML branch further later on, but this information is quite helpful. I may then focus on the Sklearn estimators or construct custom Sklearn estimators anyway, which I believe would be the same for GAMA anyway, but if I am unable to proceed so, I will indeed double-check with OAML's implementation to inspire myself for the implementation.
Sure will do, hopefully see you around a pull request for GAMA :)
Cheers,
Thus, can you ensure that this section of the documentation is a suitable starting point for introducing a new search process in addition to the existing ones?
As far as I am aware, that section is still up-to-date. If it is not, just post a comment / open an issue and we'll sort that out :)
Do you broadly concur with these thoughts?
I think if you go the way I specified, i.e., pass the files/dictionaries at initialization (__init__
) instead of when calling search
, then things should remain compatible with GAMA. If you do require further changes to method signatures then I agree that at this point I can not promise it could become standard implementation (but I would certainly have a look when you're all done).
Contributions to GAMA (and it's documentation) would be greatly appreciated ๐ if these extend beyond simply improving the documentation or fixing bugs, it's best to first open an issue (or comment on an existing one) to discuss. That increases the change we can merge the contributions into GAMA :)
Absolutely crystal clear! In the coming weeks, I will begin searching for all of this, and I will get back to you if necessary, but for now, you guys have been 100% helpful! Thank you so much and I am thrilled that GAMA has not only a community but also very helpful authors, which makes me even happier to use it :)
Cheers and have a lovely weekend you both !