Missing dimensions when performing get_output() method
meh47336 opened this issue · 1 comments
I'm using a Jupyer notebook. The system is CentOS Linux 7 and PyEMMA version 2.5.12 (conda list is attached).
I'm following the the tutorial listed here (http://www.emma-project.org/latest/tutorials/notebooks/00-pentapeptide-showcase.html) using my own trajectory (50,000 frames).
I'm performing different featurizations to compare results, so I'll just mention one: contact features giving 666 dimensions. And a quick note: the following problem occurs whether I source() or load() the data.
`contact_data = coor.load(xtc_file, contact_feat)
contact_tica = coor.tica(contact_data, lag = 1)
print(len(contact_data))
print(len(contact_data[0]))
print(contact_tica.describe())
`
Gives the output:
contact_data length = 50,000
contact_data[0] length = 666
contact_tica: (TICA, lag = 1, max. output. dim = 12)
I'm unsure why the dimensions were reduced form 666 to 12 here. However, I can still work with 12.
Next, I want to use the tica output for the VAMP-2 scoring in the tutorial. So, I use the .get_output() method, which should extract all features as a default (though I've also tried messing with the Slice).
`
contact_out = contact_tica.get_output()
print(len(contact_out))
print(len(contact_out[0]))
`
This output gives:
contact_out length = 1
contact_out[0] length = 50,000.
Why am I only getting 1 dimension here? This is causing problems with the VAMP-2 scoring in the tutorial.
Thank you so much!
condalist.txt
TICA reduces dimensions according to a variance cutoff, which defaults 95%. That means 95% of the kinetic variance are kept in the transformed data, which should explain the dimension reduction from 666 to 12 dimensions. Compare this page.
To your second question: Could it be that tica.get_output()
returns a list of arrays? Since you only have one trajectory, you'd need to take the zero-th element of that list, like contact_out = contact_tica.get_output()[0]
.