Function for batch analysis of new files
drguggiana opened this issue · 5 comments
Hihi!
First of all, thanks a ton for the work you have put on this, so far I'm super happy with VAME. Second, below I outline a sorta request/idea:
- I already have a model trained on a very large portion of my data (~500k data points) that I'm very happy with
- I'd like to incorporate new data (being produced daily) into the analysis, namely using the already trained encoder to obtain the latent space representation of the data, and also their motif structure (i.e. clustering)
- This would ideally be a single function, so that I can incorporate it into my pipeline, and where I can supply data independently from the VAME project folder, as I have my own data structure for other analyses that have nothing to do with VAME
- I looked at pose_segmentation.py . If I get this right, it seems I would need to modify embedd_latent_vectors (as the data path is assembled hardcoded from the VAME project folder structure) to get the latent representation, and also use parts of same_parameterization to get the motifs.
- This also would mean I'll have to save the kmeans object created from the trained model the first time I run vame.pose_segmentation() (i.e. will have to expose it in the code).
The question is: is this something you are working on/have something lying around/something already there that I missed, or should I just go ahead and try to code it myself (+ send a pull request if this is of interest). Hope this makes sense and I'm not completely off the mark. And thanks in advance for the help !
Hi!
Thank you for sharing your thoughts and you make some good points, which would make VAME even more flexible.
We were already discussing this idea but have not started to work on this. The trained model, however, is saved and you can always encode new data with your trained model. The tricky part is the kmeans assignment and I am not sure at the moment if there is a way to save the sklearn kmeans object. But if this is possible, you would be able to assign the same motif numbers to new data.
We might look into that as well, but if you need something like this soon, you can give it a shot and if it works we are happy to include it into the code!
Cheers,
Kevin
Hi Kevin,
Thanks for the reply. Indeed I kinda wanna use it soon, so I'll get to it and keep y'all posted :) (since the kmeans cluster centers are saved, I might be able to reconstruct the kmeans object with those, but let's see)
Best,
Drago
Hi @drguggiana,
I'm also looking into having the same functionality - did you succeed in implementing it?
Hi @chesnov,
I did. Issue is I have a slightly nightmarish setup, so some parts of my solution are very "me" specific, and hence didn't go for a pull request (sadly don't have the time to write a full general solution at the moment).
That said, if you look in my branched VAME repo, the relevant changes are in pose_segmentation.py, where I wrote a function to do the batching (plus changes in the init files to expose it), and the actual implementation is in my prey_capture repo (prey_capture/snakemake_scripts/run_latents.py), lines 65 to 92, where I grab trajectories, align them egocentrically, extract the latents with the aforementioned function and finally reconstruct the kmeans object to get the motifs (heads up, as the new version of VAME does clustering differently and I haven't updated mine yet, which will probably need more changes directly on VAME).
Not sure how useful this is for you, but feel free to hit me up if you have any questions.
Thank you again for you comments @drguggiana! I will close this issue for now, as we are preparing for an update of VAME hopefully within the next few months and I keep your ideas in mind for this.
Cheers,
Kevin