arxyzan/data2vec-pytorch

Question on personalized data modality

Closed this issue · 1 comments

Hi,

Thanks for the repo. It is really helpful! I want to get some additional suggestions from you!

I am trying to use the model with a similar architecture and training paradigm to time-series data other than audio or text. Do you suggest I start with your code, or do you think it is better to use fairseq? Fairseq's repo is a bit complicated and can be hard to transfer to my application. So, I am a little bit hesitant. I saw you mentioned you didn't implement some tricks used in fairseq. Do you think these tricks are important? How these tricks can affect pertaining?

Thanks for your knowledge in advance!!

@christincha Hi. I'm so sorry for the late response! (It's pretty easy to miss GH issues when not pinged :) )
I generally don't think that this method is useful for time series data. More importantly, this repo only implements the original V1 of the data2vec model but AFAIK, there is a version 2 also released.
Regarding the tricks, this repo only implements the core parts of the paper and cannot be actually trained on really large scale data of for distributed setups, etc. I strongly recommend you look into the codes of fairseq and try to add anything important to this repo if you actually want to use it.