Working with non-hardcoded data
flexthink opened this issue · 4 comments
As of now, I didn't find a way to pass parameters to the data()
function, and it appears to ignore everything in the code except imports. This is because Hyperas creates a new Python file out of the data, the model and everything else before attempting to train, and this works well if you're training on MNIST or some other data set that came with the framework - or on random data. But what if the data-set is selected from a drop-down or retrieved from a URL? What if you want to run it out of a script that has a config file that specifies the path to the data? What if it needs to read a database? Is there a way to do this the way Hyperas is currently set up? If not, is there anything on the roadmap?
As you pointed out hyperas is currently a simple wrapper that uses data() and model() as templates from which it formats code that it then executes. Meaning that within data() you would define everything, just like a regular script.
In all of your examples, you basically want to be able to generate new templates that hyperas can call.
e.g. lets say you have a application that uses hyperas based on an input dataset:
data_template = "def data(): \n{pipeline} \nreturn x_train,y_train,x_test,y_test"
pipelines = {'mnist': ' import something \n# some reshape \n# some scaling', ...}
def get_data_func(dset):
pipeline = pipelines[dset]
return data_template.format(pipeline=pipeline)
def model(x_train, x_test, y_train, y_test):
# define model
return {'loss' :-acc , ....}
def do_optimize(input_dset):
data_func_string = get_data_func(input_dset)
best_run, best_model = optim.minimize(model=model,
data= data_func_string,
...)
return best_run, best_model
if __name__ == '__main__':
input_dset = input('What dataset do you want optimize a model for?')
best_run, best_model = do_optimize(input_dset)
get_data_func('mnist') would return a string like:
''' def data():
import something
# some reshape
# some scaling
return x_train,y_train,x_test,y_test'''
This currently is not allowed, but shouldn't take too long to hack out. Basically just making sure that formatting is consistent with the internal of hyperas. The source that you'd want to touch is here around line 194 or so.
Something like:
if not isinstance(data,str):
# line 194
else:
data_string = data
The example above is also not how you should template strings in this situation. I recommend something like jinja if you are really going to go down that path and need flexibility. It may be better to just go with regular hyperopt in this situation.
Does this help?
Another way is to pickle the arguments for data()
to a file, then in data()
unpickle them. The file path would need to be hardcoded. You can also do this for the model()
function. Instead of pickling the data you can also save the info as a plaintext file.
A simple example:
def data():
import argparse
import pickle
args_file = 'data_args.pkl'
args = pickle.load(open(args_file, 'rb'))
(X_train, y_train) = some_file_loader(args.train)
(X_valid, y_valid) = some_file_loader(args.valid)
return X_train, y_train, X_valid, y_valid
import argparse
import pickle
parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()
args_file = 'data_args.pkl'
pickle.dump(args, open(args_file, 'wb'))
X_train, y_train, X_valid, y_valid = data()
best_run, best_model = optim.minimize(model=model,
data=data,
...)
For future reference, if someone else has this issue. There is a simple way to do it:
We just have to write a function that returns the args:
import argparse
def my_args():
parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()
return args
Then we can call it in minimize
as follows:
best_run, best_model = optim.minimize(model=model,
data=data,
functions=[my_args],
...)
then call it in model
:
def model(x_train, x_test, y_train, y_test):
args = my_args()
train_file = args.train
valid_file = args.valid
# define model
return {'loss' :-acc , ....}
```