Wedge-lab/dpclust

.Rdata input file

Opened this issue · 2 comments

RunDP currently looks for a file dataset.RData in the outdir, and reads in data from this file before running DirichlelProcessClustering. This leads to problems if running DPClust with an output directory that has been previously used, as input data is overwritten with data from a previous run. A possible fix is to add the samplename / seed / date to the Rdata filename.

I created a branch named "Adjust_Rdata_in_DirichletProcessClustering" to address the issue. Instead of using "dataset.RData" as the file name, I introduced a variable called rdata_file_name.
The rdata_file_name variable is constructed as follows:
rdata_file_name = paste(paste0("Seed-", seed), paste0("Date-", chartr(" ", "", Sys.time())), "dataset.RData", sep = "")
This adds the seed and system time to the file name.
The rdata_file_name variable is then used as the file name in all instances, replacing the hardcoded "dataset.RData" string.

P.S. If including the exact time in the file name is too specific, we can use the following command to include only the date:
rdata_file_name = paste(paste0("Seed-", seed), paste0("Date-", Sys.Date()), "dataset.RData", sep = "_")

@MiaoGaoUK, could you please review the changes in the "Adjust_Rdata_in_DirichletProcessClustering" branch and merge them into the main "DPClust" branch?