prepare and format metadata for a EBI/ENA submission of the Neotropical metabarcoding data
- start the project,
- clone,
- create folders,
- add placeholders,
- start a Rmd file,
- check that I have metadata entries for all samples (454 and Illumina),
- fetch per-sample geographical coordinates,
- add a free-form column with a description for each sample,
- replicate the structure of the file
neotrop_samples.tsv.csv
, the second line of headers is lifted from a eukbank submission template downloaded today (2022-11-24), - add columns one-by-one,
- reorder columns if need be, create new ones if need be,
- check primer sequences,
- set
depth
to zero - issues with samples
T185-186
(need to rename fastq files),T199_T200
, andL137_L13
# aragorn
cd ${HOME}/projects/neotropical-soils-submission/data/
(
cd ${HOME}/projects/Ciliata_neotropical/data/
echo -e "run\tSubmitted_files"
(
find . -name "*.sff"
find ./201*/ -name "*.fastq.*" | \
grep -vE "_P[13]_|_NT[13]_|_K[14]_|_H?T[13]_"
) | \
sort | \
tr "/" "\t" | \
cut -f 1 --complement | \
sed 's/\.bz2/.gz/'
) > list_of_read_files.tsv