cmelab/msibi

Calculate the atomistic RDFs when passing in a trajectory file to State

chrisjonesBSU opened this issue · 1 comments

It would be nice if calculating the Target RDF and saving the outputs could be handled when creating a State object by passing in a trajectory file. Also, this can help ensure that the rdf cutoff specified in MSIBI is the same one used for calculating the target RDFs.

Maybe this can be done by calling a function, or by setting up a parameter in the init function?

State.generate_target_rdf()

or generate_target_rdfs=True in the init?

Following the example in the LJ tutorial:

with gsd.hoomd.open(self.traj_file) as t:
    rdf = freud.density.RDF(bins=101, r_max=5)
    for snap in t[-50:]:
        rdf.compute(system=snap, reset=False)
    data = np.stack((rdf.bin_centers, rdf.rdf)).T
    self.target_rdf = data
    # or save 
    np.savetxt(self.state_dir, data)
)

So I'm imagining 2 possible scenarios when creating a State object from a .gsd file.

1. The state RDF still needs to be calculated.

In this case, a calc_target_rdf function can be called. The RDF is calculated and saved to a .txt file in the State's directory.

2. The user already has the RDF data available. Maybe it was already calculated and saved as a .txt file that can be loaded, or maybe it has been calculated and is stored in an array.

In this case, the .txt file is copied to the State's directory or the data in the numpy array is saved to a .txt file in the State's directory.

When creating a State instance, one of the parameters can be target_rdf that is either None, path_to_file.txt or array_like.

If None then a calculate_target_rdf function can be called automatically, or the user will have to call that function in order to generate the rdf data.

state1 = State(name="A", kT=1.0, traj="path.gsd", target_rdf = None)
state1.calculate_target_rdf(kwargs)

I think there are a couple issues with option 2. In this case, there isn't anything that ensures the target RDFs from different states were calculated using consistent parameters such as dr, nbins, rmax, etc. So, each of the 3 state RDFs could be inconsistent. Also they may be inconsistent with the rdf parameter values passed to MSIBI. If the calcualte_target_rdf function is being called, then it can get the relevant rdf parameters from the MSIBI class, ensuring everything is consistent.

Do we just leave it to the responsibility of the user to ensure that these things are being handled consistently if they want to use already existing state rdf data?