raphael-group/wext

Better expose test statistic to users

Closed this issue · 1 comments

We should better expose the test statistic to users, which we have discussed periodically. We can do so in various ways, including the following:

  1. Add command-line options for common test statistics, like mutual exclusivity and co-occurrence.
  2. Put the test statistic function in a separate, easy-to-edit script.
  3. Allow users to supply "mutation patterns," such as 001 010 100, in a command-line argument or file.
    The best approach is probably multiple approaches, e.g., easy command-line options for common mutual exclusivity and co-occurrence test statistics and something else for other test statistics.

I expect a few minor issues, including the following, but they should be minor:

  1. This change is easy to make for the saddlepoint approximation, but we also need to make changes for "exact" formulation or limit the test statistics that the exact formulation can consider.
  2. Currently, we restrict our search space by (1) only considering genes with at least f mutations and (2) only considering gene sets of size k that (3) have more mutually exclusive than co-occurring mutations. We need to remove this last restriction.
  3. On a related point, it would help to add a command-line option for limiting the number of gene sets to enumerate. It is too easy to accidently enumerate a billion gene sets instead of a million gene sets unless you carefully consider the gene mutation frequency distribution.

Resolved with 703a9e5.