Some problem around colony input files
jblamyatifremer opened this issue · 7 comments
Dear Thierry,
I am using your package to input a large amount of data to Colony. Our package is the unique way to pass data from stack to colony.
1- First, i use a awk script to filtred out locus without polymorphisms
awk '$2 > 0 {print}' "$src_root"/14_reassignation/input/batch_2.haplotypes.tsv > "$src_root"/14_reassignation/input/TRIM.haplotypes.tsv
2- I use your R package to feed the haplo2colony fonction
res <- haplo2colony("/media/XXX/TRIM.haplotypes.tsv"
, blacklist.id = NULL, whitelist.loci = NULL,
sample.markers = 5, 1, 2, pop.select = "all",
allele.freq = FALSE, inbreeding = 0, mating.sys.males = 0,
mating.sys.females = 0, clone = 0, run.length = 2, analysis = 1,
allelic.dropout = 0, error.rate = 0.02, print.all.colony.opt = FALSE,
imputations = FALSE, imputations.group = "populations", num.tree = 100,
iteration.rf = 10, split.number = 100, verbose = TRUE,
parallel.core = 2, filename = "/home/XXX/colony/colony2_v1.dat")
3- I rename the colony2_v1.dat to colony2.dat into the colony directory
4- I got an error when using colony2s.ifort.out with the
jean-baptiste@ordi[colony] mv ./colony2_v1.dat ./colony2.dat [ 6:43]
jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:43]
COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: jinliang.wang@ioz.ac.uk
Opening & reading data input file: colony2.dat
Marker 2 has the same ID, 169, as marker 1
Errors in DATA. Insufficient data or incorrect format.
Please check DATA and format and then re-run the program
Program stopped in subroutine StopOnDataError
5- After looking into the colony manual user, in the attached file (i modified the extension)
colony2.txt
line 23, the loci name (header) is duplicated... After deleting all duplicates by hand I got a new (and more severe error). :
jean-baptiste@ordi[jean-baptiste] cd ~/colony [ 6:36]
jean-baptiste@ordi[colony] ./colony2s.ifort.out [ 6:36]
COLONY, Version 2.0.6.2, Build 20160825, Expire Date 20180825
Copyright (C) by Jinliang Wang, Institute of Zoology, Zoological Society of London
Email: jinliang.wang@ioz.ac.uk
Opening & reading data input file: colony2.dat
Reading offspring genotype data...
forrtl: Is a directory
forrtl: severe (30): open failure, unit 10, file /home/jean-baptiste/colony/
Image PC Routine Line Source
colony2s.ifort.ou 0000000000633E04 Unknown Unknown Unknown
colony2s.ifort.ou 00000000006493AB Unknown Unknown Unknown
colony2s.ifort.ou 000000000042AE18 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000423E26 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000401EF6 Unknown Unknown Unknown
colony2s.ifort.ou 0000000000401E7E Unknown Unknown Unknown
colony2s.ifort.ou 00000000006E47A4 Unknown Unknown Unknown
Since colony2 inputs are quite plainfull to build-up, i will be very happy to have any inputs.
JB
Bonjour JB, je regarde le problème en fin de journée, heure du Québec.
merci!
Thierry
stackr v.0.4.6:
haplo2colony
is now deprecated in favour of write_colony
that uses more sophisticated codes that enables more input file formats. See the new version commit, or the function description for details.
try something simple first, with your batch_2.haplotypes.tsv
directly (the function takes care of monomorphic markers. e.g.:
setwd("/media/XXX/")
test <- stackr::write_colony(data = "TRIM.haplotypes.tsv", strata = "you need a strata file here")
The strata file is described in the argument of the function write_colony
, it's basically a STACKS population.map
with headers.
Look at the new colony file in your working directory.
Try that one with COLONY
if it works, then you can try out different argument parameters and imputations.
Cheers
Thierry
Dear Thierry,
I have tried your new stackR version, thanks for the quick response.
I got an error using your newest function write_colony (i provide my vcf and tsv file). I quickly look into your R code to find where the problem could come from... I did not find yet. I should spend more time on this (after 6 of december). Maybe you will be quicker.
I tried both type of input (.vcf and .tsv) with population map following the stack format. Below the outputs :
strataa <- read.table("/media/jean-baptiste/Passport0_5/002_PROJETS_CODES/RADSeq_HYSEA/14_reassignation/input/population_strata.csv", sep=",",stringsAsFactors = FALSE,header=TRUE)
write_colony("/media/jean-baptiste/Passport0_5/002_PROJETS_CODES/RADSeq_HYSEA/07_mendel_error_exploration/output/batch_2_3.vcf"
-
, strata = strataa, pop.levels = NULL, pop.labels = NULL,
-
blacklist.id = NULL, blacklist.genotype = NULL,
-
whitelist.markers = NULL, monomorphic.out = TRUE, snp.ld = NULL,
-
common.markers = TRUE, maf.thresholds = NULL, maf.pop.num.threshold = 1,
-
maf.approach = "SNP", maf.operator = "OR", max.marker = NULL,
-
sample.markers = NULL, pop.select = "all", allele.freq = "overall",
-
inbreeding = 0, mating.sys.males = 0, mating.sys.females = 0,
-
clone = 0, run.length = 1, analysis = 1, allelic.dropout = 0,
-
error.rate = 0.02, print.all.colony.opt = FALSE,
-
imputation.method = NULL, impute = "genotype",
-
imputations.group = "populations", num.tree = 100, iteration.rf = 10,
-
split.number = 100, verbose = TRUE,
-
parallel.core = parallel::detectCores() - 1, filename = "/home/jean-baptiste/colony/colony2_v2.dat")
#######################################################################
######################## stackr::write_colony ########################
#######################################################################
File type: vcf.file
Importing data...
Error in if (biallelic > 4) { : missing value where TRUE/FALSE needed
write_colony("/media/jean-baptiste/Passport0_5/002_PROJETS_CODES/RADSeq_HYSEA/14_reassignation/input/TRIM.haplotypes.tsv"
-
, strata = strataa, pop.levels = NULL, pop.labels = NULL,
-
blacklist.id = NULL, blacklist.genotype = NULL,
-
whitelist.markers = NULL, monomorphic.out = TRUE, snp.ld = NULL,
-
common.markers = TRUE, maf.thresholds = NULL, maf.pop.num.threshold = 1,
-
maf.approach = "SNP", maf.operator = "OR", max.marker = NULL,
-
sample.markers = NULL, pop.select = "all", allele.freq = "overall",
-
inbreeding = 0, mating.sys.males = 0, mating.sys.females = 0,
-
clone = 0, run.length = 1, analysis = 1, allelic.dropout = 0,
-
error.rate = 0.02, print.all.colony.opt = FALSE,
-
imputation.method = NULL, impute = "genotype",
-
imputations.group = "populations", num.tree = 100, iteration.rf = 10,
-
split.number = 100, verbose = TRUE,
-
parallel.core = parallel::detectCores() - 1, filename = "/home/jean-baptiste/colony/colony2_v2.dat")
#######################################################################
######################## stackr::write_colony ########################
#######################################################################
File type: haplo.file
Importing data...
Error in enc2utf8(col_names(col_labels, sep = sep)) :
argumemt is not a character vector
I juste change the extension from .tsv to .txt (the vcf file is to big) (Github only accepts conventional extensions
for attached files).
Cheers,
JB
PS : N'hesite pas à m'envoyer un MP si tu veux des precisions. Je peux t'envoyer le vcf sur filesender.
TRIM.haplotypes.txt
Ok send me the strata file please
I'll test your haplotypes file
Hi JB,
Try:
Test1: as mentioned above
test <- stackr::write_colony(data = "TRIM.haplotypes.tsv", strata = "you need a strata file here")
This works or not ?
Test2 with strata file already in the global environment:
strata <- readr::read_tsv(file = "strata.test.colony.tsv")
stackr::write_colony(data = "TRIM.haplotypes.txt", strata = strata)
This should also work
Comments
- You don't have to specify the other arguments if your using the defaults.
- Make sure you read carefully the argument of
write_colony
. - The argument
pop.select
:- the default value is
pop.select = NULL
all
is not an option.- you don't have any grouping, so leave it to the default so that
write_colony
takes care of it - however, if you need to specify a population use the save value as in your
STRATA
column.
- the default value is
Cheers
Thierry
Dear Thierry
Test1: as mentioned above
test <- stackr::write_colony(data = "TRIM.haplotypes.tsv", strata = "you need a strata file here")
My previous command was messy with non-existent option as you highlighted.
I was able to get colony input from the command above but Colony throw me an error (I do not have my linux station with me to reproduce the error message). Roughly, colony was complained about the format and the amount of data.
But i re-tried the same command with a ".vcf" from the same dataset (with the same strata file), No more error and Colony is still working on it.
That is good...
Tomorow, i tried again to get a colony input with stackr from the ".tsv" to figure out what is going on.
I also see that you are involved in Mapcomp... I will use it very soon. bravo for your work it is very usefull !
Test2 with strata file already in the global environment:
strata <- readr::read_tsv(file = "strata.test.colony.tsv")
stackr::write_colony(data = "TRIM.haplotypes.txt", strata = strata)