yourh/DeepGraphGO

关于InterProScan生成蛋白质特征

Closed this issue · 3 comments

作者您好,关于InterProScan生成蛋白质特征,有个问题想请教下:
你们在使用interproscan时,用了哪些application:
Available analyses:
TIGRFAM (XX.X) : TIGRFAMs are protein families based on Hidden Markov Models or HMMs;
SFLD (X.X) : SFLDs are protein families based on Hidden Markov Models or HMMs;
amap (XXXXXX.XX) : High-quality Automated and Manual Annotation of Microbial Proteomes;
SMART (X.X) : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs;
CDD (X.XX) : Prediction of CDD domains in Proteins;
ProSiteProfiles (XX.XXX) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them;
ProSitePatterns (XX.XXX) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them;
SUPERFAMILY (X.XX) : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes;
PRINTS (XX.X) : A fingerprint is a group of conserved motifs used to characterise a protein family;
PANTHER (X.X) : The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence;
Gene3D (X.X.X) : Structural assignment for whole genes and genomes using the CATH domain structure database;
PIRSF (X.XX) : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships;
Pfam (XX.X) : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs);
Coils (X.X) : Prediction of Coiled Coil Regions in Proteins;
MobiDBLite (X.X) : Prediction of disordered domains Regions in Proteins.

yourh commented

都用了,除了PANTHER

还有个问题想请教下,你们是如何将interproscan分析输出结果转换为binary feature的,能提供相关代码吗?谢谢!

yourh commented

interproscan输出的结果就是每条蛋白质相关的domain/family/motif,然后转成one-hot特征就行了。这部分代码太简单了没整理,所以没提供。