markovmodel/PyEMMA

Missing dimensions when performing get_output() method

meh47336 opened this issue · 1 comments

I'm using a Jupyer notebook. The system is CentOS Linux 7 and PyEMMA version 2.5.12 (conda list is attached).

I'm following the the tutorial listed here (http://www.emma-project.org/latest/tutorials/notebooks/00-pentapeptide-showcase.html) using my own trajectory (50,000 frames).

I'm performing different featurizations to compare results, so I'll just mention one: contact features giving 666 dimensions. And a quick note: the following problem occurs whether I source() or load() the data.

`contact_data = coor.load(xtc_file, contact_feat)
contact_tica = coor.tica(contact_data, lag = 1)

print(len(contact_data))
print(len(contact_data[0]))

print(contact_tica.describe())
`
Gives the output:

contact_data length = 50,000
contact_data[0] length = 666

contact_tica: (TICA, lag = 1, max. output. dim = 12)

I'm unsure why the dimensions were reduced form 666 to 12 here. However, I can still work with 12.

Next, I want to use the tica output for the VAMP-2 scoring in the tutorial. So, I use the .get_output() method, which should extract all features as a default (though I've also tried messing with the Slice).

`
contact_out = contact_tica.get_output()

print(len(contact_out))
print(len(contact_out[0]))
`

This output gives:
contact_out length = 1
contact_out[0] length = 50,000.

Why am I only getting 1 dimension here? This is causing problems with the VAMP-2 scoring in the tutorial.

Thank you so much!
condalist.txt

TICA reduces dimensions according to a variance cutoff, which defaults 95%. That means 95% of the kinetic variance are kept in the transformed data, which should explain the dimension reduction from 666 to 12 dimensions. Compare this page.

To your second question: Could it be that tica.get_output() returns a list of arrays? Since you only have one trajectory, you'd need to take the zero-th element of that list, like contact_out = contact_tica.get_output()[0].