/psss-team2

Primary LanguagePythonMIT LicenseMIT

Identifying contig similarity across metagenomic datasets

Last modified on: September 30, 2021

This project is part of the Petabyte-Scale Sequence Search: Metagenomics Benchmarking Codeathon, hosted virtually from Monday, September 27, 2021 to Friday, October 1, 2021.

Problem Statement

Given a query contig, can we find contigs in other samples that are completely contained within it, or that completely contain it?

This is biologically relevant for instances when you have interesting metagenome contigs and want to determine if the same contigs have been observed in any other metagenome datasets.

Within the codeathon, we identified a way to benchmark contig containments (ie. if you use different tools to identify containments, how close to the "truth" are the results). Detailed codeathon project organization is found in the wiki.

Team

Mihai Pop, PhD University of Maryland, College Park

Rob Patro, PhD University of Maryland, College Park

Jackie Michaelis, PhD University of Maryland, College Park