markovmodel/adaptivemd

PyEmma Analysis

thempel opened this issue · 8 comments

For actual adaptive simulations, a flexible way to analyze the data is very important. Apart from choosing input parameters such as the lag time and msm states, it will be necessary to modify e.g. input features. Modifying msmanlyze.py, which is effectively part of the adaptivemd source code, is not very convenient. It might further be a bit risky to give the user the standard analysis for the alanine dipeptide because he must opt-out in order to avoid meaningless yet working results. For more complex types of trajectories, even more options need to be considered. I would suggest to either

  • have a script-based solution that allows the user to write custom scripts. Maybe adding its path to a modeller object. This would also allow to keep track of which modeller has been used to generate which trajectories.

or

  • a function-based solution: I don't know if this even works, but it might be even better to define custom functions for the analysis which must take a given set of input parameters and produce the output in a given shape. I am thinking of something similar to PyEmma's featurizer.add_custom_function(). It would allow to directly see which keyword arguments can be chosen. Further, the function could be stored in the database, I suppose, making it easy to keep track of the used strategy. Might also be easier to add this to the "brain"...

For basic model building functionality required for adaptive sampling it would be sufficient to slightly extend the options of remote_analysis(). The minimal functionality includes the following options:

  • featurizer selection (e.g. 'add_all', 'add_backbone_torsion')
  • transformation (e.g. None or TICA)
  • TICA options (lag, kinetic variance or number of dimensions)
  • clustering method (k-means or regspace + metric, cutoff or number of clusters)
  • MSM lagtime

If these options can be passed most cases will be covered. For everything more complicated a custom function or additional script could be used.

featurizer selection (e.g. 'add_all', 'add_backbone_torsion')

This is in there now.

I think we should cover the usual suspects. If you want something really fancy you can always write your own analysis code. NP. But most people will want to use PyEmma in some standard ways like we teach in the courses (as @nsplattner listed). Features are in there now. TICA is always on but has some options. Clustering should be selectable, but so far we only have n_states. MSM lagtime is in there.

I'm not sure how the function remote_analysis() is supposed to work. The choice of features seems to be hardcoded (line 44, feat.add_backbone_torsions()) Is this supposed to be an example or customizable? How can arguments be passed to the featurizer? If its an example it should not be in the main code but rather in the tutorial directory.

This was an example where I hardcoded it. it should be obvious what to change. Unfortunately PyEMMA does not allow to store a feature description in some way. but the upcoming PR #28 will change that.

O.k., thanks for the details!
It is obvious what to change, the problem is that a) its not clear that this is an example since its placed in the package and b) its not convenient to have a custom function placed in the package since its lost when the code is updated.

Sorry for the confusing. It was not planned originally to turn it into a package. I did that to make it easier for you guys. All the additional work including cleanups, documentation is kind of hard to do in 2 weeks time.

PR #28 and #35 will solve that problem and allow much more customization tough.