privefl/bigstatsr

Low CPU %?

biona001 opened this issue · 3 comments

I am running big_spLinReg on 350k samples and ~600k SNPs. The job has been running for >1 day. I requested 24 cores with 120G memory. Here is a screenshot of the top command:

Screen Shot 2022-03-28 at 11 35 08 AM

The command I called is

    # import y and Z (additional covariates)
    # ...
    G <- snp_attach(rdsfile)
    lasso.fit <- big_spLinReg(G$genotypes, y.train=y[train_idx], 
        covar.train=Z[train_idx, ], ind.train=train_idx,
        pf.covar = rep(0, ncol(Z)),
        n.abort = 2, nlam.min=30,
        dfmax=100000, ncores=24, K=10)

My questions are:

  • How do I use all 24 cores?
  • Why is the CPU percentage not at 100%? They are almost always around 20-50%.
  • #88 seems to suggest I need to generate a new FBM matrix every time? Is that still recommended?
  • If you run 10 models, it will use 10 cores max (in the second step).

  • Yes, it should be 100% when the first step is finished (which seems to be the case here since only 10 cores are used).

  • I guess the full data is ~354 GB on disk. If the phenotype is very polygenic (e.g. height), it might end-up going to full dfmax, and then you might need a bit more memory (try with 200GB if possible), or dfmax = 50e3.

  • No, generating a new FBM will not help you.

  • You should define G <- snp_attach(rdsfile)$genotypes.

Thank you for the quick response.

For your second point, the CPU usage is only around 50%? Why do you say it is at 100%?

I said it should. Maybe you're just swapping a bit, and I guess if you increase memory to 200 GB, this should be okay.