Custom embeddings
rhysnewell opened this issue · 0 comments
Hi Devs,
Thanks for this package, really cool work and it seems very well put together. I just had a question regarding the creation of custom embedding sets. In this example (https://github.com/sacdallago/bio_embeddings/blob/develop/notebooks/goPredSim.ipynb) you use the ProtBertBFDEmbedder to generate embeddings for a novel peptide and compare it against a set of reference embeddings (https://github.com/sacdallago/bio_embeddings/blob/develop/notebooks/goPredSim.ipynb). You use k-nn to determine which UniProt entry best matched the novel peptide and return the accession.
I was wondering, is it possible to create a completely custom reference embedding h5
file from a database other than UniProt (Like a virulence factor database) and then compare novel peptide embeddings to that reference embedding set? Or is that outside the scope of these models?
Just want to make sure that that use case is valid before I pursue this.
Cheers,
Rhys