About failed to read the hdf5 file
CHENGHUAN555 opened this issue · 4 comments
Is there an existing issue for this?
- I have searched the existing issues
Bug description
When running the following code, an error was reported in line 30:
,"continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])" ,mainly because an error occurred when cebra.load_data() was used as the.h5 file. I do not know how to solve it, and I hope to seek the author's help and solution.
---------------------------------------------------------------------------------------------------------------------------------------------
test.py
---------------------------------------------------------------------------------------------------------------------------------------------
# Create a .h5 file, containing a pd.DataFrame
import pandas as pd
import numpy as np
X_continuous = np.random.normal(0,1,(100,3))
X_discrete = np.random.randint(0,10,(100, ))
df = pd.DataFrame(np.array(X_continuous), columns=["continuous1", "continuous2", "continuous3"])
df["discrete"] = X_discrete
df.to_hdf("auxiliary_behavior_data.h5", key="auxiliary_variables")
import cebra
from numpy.random import uniform, randint
from sklearn.model_selection import train_test_split
# 1. Define a CEBRA model
cebra_model = cebra.CEBRA(
model_architecture = "offset10-model",
batch_size = 512,
learning_rate = 1e-4,
max_iterations = 10, # TODO(user): to change to at least 10'000
max_adapt_iterations = 10, # TODO(user): to change to ~100-500
time_offsets = 10,
output_dimension = 8,
verbose = False
)
# 2. Load example data
neural_data = cebra.load_data(file="neural_data.npz", key="neural")
new_neural_data = cebra.load_data(file="neural_data.npz", key="new_neural")
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
discrete_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"]).flatten()
assert neural_data.shape == (100, 3)
assert new_neural_data.shape == (100, 4)
assert discrete_label.shape == (100, )
assert continuous_label.shape == (100, 3)
# 3. Split data and labels
(
train_data,
valid_data,
train_discrete_label,
valid_discrete_label,
train_continuous_label,
valid_continuous_label,
) = train_test_split(neural_data,
discrete_label,
continuous_label,
test_size=0.3)
# 4. Fit the model
# time contrastive learning
cebra_model.fit(train_data)
# discrete behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label,)
# continuous behavior contrastive learning
cebra_model.fit(train_data, train_continuous_label)
# mixed behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label, train_continuous_label)
# 5. Save the model
cebra_model.save('/tmp/foo.pt')
# 6. Load the model and compute an embedding
cebra_model = cebra.CEBRA.load('/tmp/foo.pt')
train_embedding = cebra_model.transform(train_data)
valid_embedding = cebra_model.transform(valid_data)
assert train_embedding.shape == (70, 8)
assert valid_embedding.shape == (30, 8)
# 7. Evaluate the model performances
goodness_of_fit = cebra.sklearn.metrics.infonce_loss(cebra_model,
valid_data,
valid_discrete_label,
valid_continuous_label,
num_batches=5)
# 8. Adapt the model to a new session
cebra_model.fit(new_neural_data, adapt = True)
# 9. Decode discrete labels behavior from the embedding
decoder = cebra.KNNDecoder()
decoder.fit(train_embedding, train_discrete_label)
prediction = decoder.predict(valid_embedding)
assert prediction.shape == (30,)
Operating System
windows 10
CEBRA version
cebra version 0.2.0
Device type
gpu
Steps To Reproduce
No response
Relevant log output
Traceback (most recent call last):
File "E:\crop\injuryrun4\test.py", line 30, in <module>
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 661, in load
data = loader.load(file, key=key, columns=columns)
File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 211, in load
raise ModuleNotFoundError()
ModuleNotFoundError
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
@gonlairo can you take a look?
@CHENGHUAN555 did you install the data pip install cebra[datasets]
otherwise indeed the module is not loaded. I suggest checking out demos here: https://cebra.ai/docs/demos.html, which use a particular data loader, but nonetheless you get the idea. See install here: https://cebra.ai/docs/installation.html#id1
This solved it for me when I hit this error when working through the code in the Usage page.
A couple of notes (feel free to ignore 😄) -- I found the ModuleNotFound
message a bit hard to interpret as it didn't say what module, so I wasn't sure how to proceed. Also, the installation page says that the datasets
optional dependency is for working with the datasets at Figshare. Hence, when I got the error on the Usage page when trying to do stuff with synthetic data, I didn't consider the correct solution.
Anyway, minor wrinkles -- congrats on the cool package I'm having fun with it so far!
@EricThomson , thanks for flagging. I created a new issue to track these potential improvements here: #77