czbiohub-sf/cerebra

JoSS Review: Clarify README.md

Closed this issue · 3 comments

With regard to the ongoing review for your JoSS submission, I would like to suggest stating some sort of use cases and/or warnings in your README.md. In particular, one of the largest drawbacks is the case of large VCF files. You even state as much in your code:

def dataframe(filename, large=True):
    """ [...]
    Note: Using large=False with large VCF files. It will be painfully slow.
    [...]
    """

This same cannot be said about your instructions/quickstart.

VCFs can be excruciatingly large sometimes (looking at you 1KG/ENCODE). Many times, researchers want/need to summarize from this starting point. Therefore, suggesting something like filtering the VCF based on region/contig prior to running cerebra would be thoughtful suggestion to gung-ho users to preempt potentially long-running jobs.

thats a great idea - yeah we've noticed that A) VCFs vary greatly in size depending on the experimental source and B) to state the obvious, larger VCFs run much, much slower.

should be addressed with 6b92c81

This satisfies my concern. Thank you