HakuSean/DA-Net

Can you give a short brief of how to run this code ?

Opened this issue · 14 comments

Thank you so much for sharing this implementation !
Could you give us a few words on how to run this code ?

Happy to assist your research, and sorry for not maintaining the project well.
You may need to first use eccv_baseline_tsn to generate initialized weights for every branch and use the trained model to run latent. For both the folders, you can easily run scripts/train.sh for training, and indicate dataset (ntu or nucla), modality (rgb or flow), option (folder to save checkpoints) and split (based on the splits from the definition of the dataset).
Of course you need to install caffe prior to using the model. You can refer to Temporal Segmentation Networks for some basic settings.

Thank you so much for replying ^_^ , did you get the result like one being reported in the paper? the code is way complicated with me, I need more time for understanding it, and also giving out my first run. I might ask some questions then, are you willing to help ?

@HakuSean I got this error when run your project, can you give me 1 way to fix it ?
Check failed: ExactNumTopBlobs() == top.size() (2 vs. 3) VideoData Layer produces 2 top blob(s) as output.
Thanks bro!

@buianhvu Sorry for my delay. I do have the trained models but it seems messed up with other results. I will try to wrap it up and upload in google drive recently.

@cong235 Hi, I think the problem is that you are running my prototxt files within your own caffe. I have made some modifications in caffe layers, which you can refer to the caffe link https://github.com/HakuSean/DA-Net/tree/master/latent/lib/caffe-action for more.

Dear Hakusean, is your caffe-action similar to this one ? https://github.com/yjxiong/caffe/tree/9831a2b4d67e3f99418b6f2f99b6dde716853672 , I am trying to build it but I face some problems.

@cong235 Hi, I think the problem is that you are running my prototxt files within your own caffe. I have made some modifications in caffe layers, which you can refer to the caffe link https://github.com/HakuSean/DA-Net/tree/master/latent/lib/caffe-action for more.

i was running with your caffe-action but still got the error:
I0911 22:30:50.670941 6214 layer_factory.hpp:74] Creating layer data
I0911 22:30:50.670964 6214 net.cpp:99] Creating Layer data
I0911 22:30:50.670971 6214 net.cpp:435] data -> data
I0911 22:30:50.671000 6214 net.cpp:435] data -> label
I0911 22:30:50.671010 6214 net.cpp:435] data -> view
I0911 22:30:50.671016 6214 net.cpp:163] Setting up data
F0911 22:30:50.671027 6214 layer.hpp:394] Check failed: ExactNumTopBlobs() == top.size() (2 vs. 3) VideoData Layer produces 2 top blob(s) as output.

And i wasnt setup data. That is reason for error right ?

@buianhvu Yes, that's the one. What are the problems?

@cong235 It seems to be the problem of your input list. The input list should contain data, frame_num, action_label and view_label. You can refer to any lists in https://github.com/HakuSean/DA-Net/tree/master/latent/data/.
Besides, the codes to read the data are like below (in https://github.com/HakuSean/DA-Net/blob/master/latent/lib/caffe-action/src/caffe/layers/video_data_layer.cpp):

while (infile >> filename >> length >> label >> view){
lines_.push_back(std::make_pair(filename,label)); // filename and corresponding label
lines_view_.push_back(view); // corresponding camera, which is used to choose the branch
lines_duration_.push_back(length);
}

Dear Hakusean, do I need to extract frames from NTU dataset, because when I extract raw files from the dataset all I got is .avi files, but it seems that your reader need .jpg ? Am I correct ?

Dear Hakusean, do I need to extract frames from NTU dataset, because when I extract raw files from the dataset all I got is .avi files, but it seems that your reader need .jpg ? Am I correct ?

Yes, you're correct. You can find how to extract them here.
Easily, run the following:
bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER
where SRC_FOLDER is folder containing the .avi files and OUT_FOLDER is where you want to put them, while NUM_WORKER depends on how many GPUs you have (I put 1).

It will take a while to run all of it, so I recommend to try with a small batch of videos. But to do so you have also to edit latent/data/ntu/ntu_rgb_val_split_1, latent/data/ntu/ntu_flow_val_split_1, latent/data/ntu/ntu_rgb_train_split_1 and latent/data/ntu/ntu_fow_train_split_1

You could encounter errors due to the caffe version, just type them here. After a while I overcame all of them.

Dear Hakusean, do I need to extract frames from NTU dataset, because when I extract raw files from the dataset all I got is .avi files, but it seems that your reader need .jpg ? Am I correct ?

Yes, you're correct. You can find how to extract them here.
Easily, run the following:
bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER
where SRC_FOLDER is folder containing the .avi files and OUT_FOLDER is where you want to put them, while NUM_WORKER depends on how many GPUs you have (I put 1).

It will take a while to run all of it, so I recommend to try with a small batch of videos. But to do so you have also to edit latent/data/ntu/ntu_rgb_val_split_1, latent/data/ntu/ntu_flow_val_split_1, latent/data/ntu/ntu_rgb_train_split_1 and latent/data/ntu/ntu_fow_train_split_1

You could encounter errors due to the caffe version, just type them here. After a while I overcame all of them.

thank you for your help, I have just run the file you mentioned earlier after configuring a bit, and I got a folder which contains subfolders of flow-x and flow-y images of frames. It seems to be correct? Sorry for replying lately because I'm busy with works. Are you finished running the whole things yet AntonioMarsella?

Hello everyone, I am now training the eccv_base_line by using the NTU dataset, after the training finish I will get the model files to feed in the latent, right ? but the dataset of NTU is too large for one epoch, thus it takes so much that my hardware resource can not afford. Could anyone send me the pretrained models for the eccv_base_line and the latent ? another question: what happens if I only train the model with much fewer number of classes of actions (~1000 or 2000 videos only) ? Thank you for reading !

@HakuSean I have switched to the dataset of NUCLA for a small size of dataset. If its possible, could you please share the pretrained models for the eccv_baseline_tsn and for the latent with me?
Actually, I have modified a bit in the train_val protol to make it fit for NUCLA. I can not load the initialized weights in the get_init_models.sh for a pretrained because the link seems to die, thus I have to train things from scratch, and my hardware is not so strong to boost the speed.
Thank you so much.