Takes input protein files from data/
folder, and generate rdata
files for each protein.
- Input:
data/[protein]/[protein]_ref_sequences_filtered.txt.gz
,data/[protein]/[protein]_sel_sequences_filtered.txt.gz
- Output:
[protein].rda
-
R scripts:
vfits.R
,vfits_i_1.R
,vfits_i_2.R
vfits.R
contains a script fitting PU models over 20 py1 values, where at each py1 value 10-fold CV are performed.vfits_i_1.R
is a parallel version of vfits.R. It contains a script fitting PU models for one CV fold.vifts_i_2.R
is a script which aggregates 10 fitting results fromvfits_i_1.R
-
Input:
[protein].rda
files in `data-r/`` -
Output:
Rdata/vfit_[protein].Rdata
-
Scripts are run in high performance computing server environment where
protein.name
andnCores
are provided as input arguments. (This script can be also run in a local computer as well.) -
The folder
code/vfits/Rdata/
is not uploaded in GitHub due to the file sizes. (Files can be recreated by runningvfits.R
script.)
-
R scripts:
fits.R
-
fits.R contains a script fitting a single PU model at a specified py1 value.
-
R script:
enr_test_fit.R
-
enr_test_fit.R
contains a script fitting 10 CV PU and enrichment score models -
Input:
[protein].rda
files in `data-r/`` -
Output:
Rdata/enr_r_[protein].Rdata
-
Scripts are run in high performance computing server environment where
protein.name
,r
, andnCores
are provided as input arguments. -
The folder
code/enrich_comparison/Rdata/
is not uploaded in GitHub due to the file sizes
-
R script:
rocs.R
-
Input: vfits (code/vfits/Rdata)
-
Output:
[protein].eps
(for 10 proteins) (Figure 3 (a)) -
R script:
aucs.R
-
Input: vfits (code/vfits/Rdata)
-
Output:
aucs.csv
(Figure 3 (b) for PU model)
- R script:
enr_test_summarize.R (code/enrich_comparison/Rdata)
- Input:
enr_[r]_[protein].Rdata
(r=1,...,10) - Output:
PU_enr_pvalue_diff.csv
(Figure 3 (c)),comp_enr_diff.eps
(Supplementary Figure 3 (b)),
- R script:
cv_fold.R
- Input: vfits (code/vfits/Rdata)
- Output:
cv_density.png
(Supplementary Figure 3 (c))
- R script:
stability.R
- Input: vfits (code/vfits/Rdata)
- Output:
avg_featsel_stability.eps
(Supplementary Figure 3 (d))
- R script:
py1_stability.R
- Input: vfits (code/vfits/Rdata)
- Output:
py1_stability.eps
(Supplementary Figure 3 (e))
- R script:
pvalues_Bgl3_HT.R
- Input:
fit_Bgl3_HT_[r].Rdata (code/fits)
- Output:
top_10_muts.csv
(Supplementary Figure 5 (b))