amsyuan's Stars
lucidrains/progen
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax
katholt/srst2
Short Read Sequence Typing for Bacterial Pathogens
lc222/char-cnn-text-classification-tensorflow
Character-level Convolutional Networks for Text Classification论文仿真实现
ShanlinKe/COVID-19
Dissecting the Role of the Human Microbiome in COVID-19 via Metagenome-assembled Genomes
RasmussenLab/phamb
Downstream processing of VAMB binning for Viral Elucidation
RasmussenLab/vamb
Variational autoencoder for metagenomic binning
dominicp6/ImmuneConstrainedVAE
Dozens of vaccines protecting against SARS-CoV-2 have now been approved for public use, yet there remains a high risk that the virus evolves to escape vaccine protection. This motivates the need for a new generation of vaccines that can protect against a wider gamut of a virus’s evolutionary accessible states, not just the currently circulating strains. Computational methods such as sequence generative models can play a critical role in mapping out this state space. In particular, they can be used to screen thousands of examples of viral proteins that might pose a high risk of vaccine escape. In this work, we take steps towards such a computational method by designing and evaluating a conditional Variational Autoencoder (VAE) capable of selectively generating SARS-CoV-2 spike proteins with low immune visibility. The model is trained on $65,000$ of the most common wild-type SARS-CoV-2 sequences and uses NetMHCpan to estimate levels of exposure to human T cell immunity. The model's generated sequences are compared with those derived from two simpler generative models; a random-mutator and an 11-gram language model. We discover that although all three models are able to generate stable, structurally valid sequences, only the VAE model can generate low immunogenicity sequences sampled from a distribution that interpolates smoothly along the principal variance directions of natural sequences.
salesforce/progen
Official release of the ProGen models
debbiemarkslab/EVcouplings
Evolutionary couplings from protein and RNA sequence alignments
lambdal/deeplearning-benchmark
Benchmark Suite for Deep Learning
tseemann/snippy
:scissors: :zap: Rapid haploid variant calling and core genome alignment
biobakery/biobakery
bioBakery tools for meta'omic profiling
asadprodhan/GPU-accelerated-guppy-basecalling
GPU-accelerated guppy basecalling and demultiplexing on Linux
gencorefacility/covid19
Variant Analysis Pipeline for COVID19
appliedmicrobiologyresearch/covgap
Genome mapping, consensus generating, variant calling and annotation tool for SARS-COV-2
nextstrain/nextclade_data
Datasets for https://github.com/nextstrain/nextclade
hsnguyen/assembly
Streaming assembly for MinION data
mdcao/npScarf
neherlab/treetime
Maximum likelihood inference of time stamped phylogenies and ancestral reconstruction
theosanderson/chronumental
Estimating time trees from very large phylogenies
nextstrain/ncov
Nextstrain build for novel coronavirus SARS-CoV-2
PoonLab/covizu
Rapid analysis and visualization of coronavirus genome variation
maximilianh/multiSub
Prepares a SARS-CoV-2 submission for GISAID, NCBI or ENA. Can read GISAID or NCBI files, or plain fasta+tsv/csv/xls. Finds files in input directory and merges everything into a single output directory. Auto-detects input file formats. Can submit the results to multiple repositories from the command line.
cov-lineages/pangolin
Software package for assigning SARS-CoV-2 genome sequences to global lineages.
epi2me-labs/wf-artic
ARTIC SARS-CoV-2 workflow and reporting
LPDI-EPFL/trivalent_cocktail
JianiC/rsv
Nextstrain build for Human Respiratory Syncytial Virus
salvatoreloguercio/cov2vec
cov2vec is a systematic effort to obtain SARS CoV-2 genome embeddings by encoding viral genomes with protein language models.
brianhie/viral-mutation
Language modeling of viral evolution
happywlu/CroTrait
A portable tool for in silico species identification, serotyping and multilocus sequence typing of Cronobacter genus