Low CPU %?
biona001 opened this issue · 3 comments
I am running big_spLinReg
on 350k samples and ~600k SNPs. The job has been running for >1 day. I requested 24 cores with 120G memory. Here is a screenshot of the top
command:
The command I called is
# import y and Z (additional covariates)
# ...
G <- snp_attach(rdsfile)
lasso.fit <- big_spLinReg(G$genotypes, y.train=y[train_idx],
covar.train=Z[train_idx, ], ind.train=train_idx,
pf.covar = rep(0, ncol(Z)),
n.abort = 2, nlam.min=30,
dfmax=100000, ncores=24, K=10)
My questions are:
- How do I use all 24 cores?
- Why is the CPU percentage not at 100%? They are almost always around 20-50%.
- #88 seems to suggest I need to generate a new FBM matrix every time? Is that still recommended?
-
If you run 10 models, it will use 10 cores max (in the second step).
-
Yes, it should be 100% when the first step is finished (which seems to be the case here since only 10 cores are used).
-
I guess the full data is ~354 GB on disk. If the phenotype is very polygenic (e.g. height), it might end-up going to full
dfmax
, and then you might need a bit more memory (try with 200GB if possible), ordfmax = 50e3
. -
No, generating a new FBM will not help you.
-
You should define
G <- snp_attach(rdsfile)$genotypes
.
Thank you for the quick response.
For your second point, the CPU usage is only around 50%? Why do you say it is at 100%?
I said it should. Maybe you're just swapping a bit, and I guess if you increase memory to 200 GB, this should be okay.