HCGB-IGTP/XICRA

NO samples were retrieved

Opened this issue · 8 comments

(XICRA) [mbansal@login004 teton1]$ XICRA prep -i ./teton1/ -o ./XICRA_Prep/ --debug

######################################################################

XICRA pipeline

Jose F. Sanchez & Lauro Sumoy

Copyright (C) 2019-2021 Lauro Sumoy Lab, IGTP, Spain

######################################################################

|==================================================|
| Preparing samples |
|==================================================|

--------- Starting Process ---------
01/25/2023, 18:47:21

  • Create output folder(s):
  • Generate a directory containing information within the project folder provided

  • Getting files from input folder...

  • Mode: fastq.

  • Extension:
    [ fastq, fq, fastq.gz, fq.gz ]

  • Input folder exists
    ** DEBUG: sampleParser.get_files files
    {'./teton1/Teton_R2.fastq.gz', './teton1/Teton_R1.fastq.gz'}

**DEBUG: sampleParser.get_files files list to check **
DO NOT PRINT THIS LIST: It could be very large...
set()

** DEBUG: select_samples
non_duplicate_names:
[]
** DEBUG: select_samples
non_duplicate_names:
set()
samples_prefix
{'.*'}
non_duplicate_samples
[]
tmp dataframe
Empty DataFrame
Columns: [sample, file]
Index: []
Empty DataFrame
Columns: [sample, file]
Index: []
** DEBUG: select_samples
name_frame_samples:
Empty DataFrame
Columns: [sample, dirname, name, new_name, name_len, lane, read_pair, lane_file, ext, gz, tag, file]
Index: []
number_files:
0
total_samples:
set()

**ERROR: No samples were retrieved. Check the input provided

I have used the same extension for paired end file but it is keep giving me the same error.. Please help

Hi there,
Thank you very much for pointing it out.

Let me ask you if you have tried the subset example that I provide, either in the git repo or downloaded using XICRA test

Thanks in advance.

Yes. It is giving the error

I used following command
XICRA prep -i ./subset_PE/ -o XICRA_analysis_PE

and it is giving below error
** ERROR **
[** System: ln -s /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/subset_PE/rep_2_R2.fq.gz /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_2/raw/rep_2_R2.fq.gz ]
ln: failed to create symbolic link \u2018/data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_2/raw/rep_2_R2.fq.gz\u2019: File exists
** ERROR **
b''
** ERROR **
[
System: ln -s /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/subset_PE/rep_3_R2.fq.gz /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_3/raw/rep_3_R2.fq.gz **]
ln: failed to create symbolic link \u2018/data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_3/raw/rep_3_R2.fq.gz\u2019: File exists
** ERROR **
b''
** ERROR **

Hi there,

Basically, this is printed it out as an error but it is just a warning, it says symbolic link can not be created because it already exists. Can you follow with the pipeline?

Type:

XICRA QC -i XICRA_analysis_PE
XICRA join -i XICRA_analysis_PE --noTrim 
XICRA miRNA -i XICRA_analysis_PE --software miraligner

Thanks in advance
Regards

I get the same error when running sh test_subset.sh.

Test data was retrieved via XICRA test.

There's a Traceback message thrown almost immediately. Then, a KeyError: 'sample_8' shortly thereafter.

Here's a truncated version of the output (but all modules fail with "No samples were retried..." message):

$ sh test_subset.sh
Thu Feb  1 12:15:50 PST 2024 ... Starting ...

# ------------------------------ #
XICRA prep -i ./subset_SE/ -o XICRA_analysis --single_end
...

Traceback (most recent call last):
  File "/home/sam/programs/mambaforge/envs/XICRA/bin/XICRA", line 396, in <module>


######################################################################
#                           XICRA pipeline                           #
#                   Jose F. Sanchez & Lauro Sumoy                    #
#        Copyright (C) 2019-2022 Lauro Sumoy Lab, IGTP, Spain        #
######################################################################


|==================================================|
|                Preparing samples                 |
|==================================================|


--------- Starting Process ---------
	02/01/2024, 12:15:51


+ Create output folder(s):
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis 
+ Generate a directory containing information within the project folder provided
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/info 

--------------------------------------------------
+ Getting files from input folder... 
+ Mode: fastq.
+ Extension: 
[ fastq, fq, fastq.gz, fq.gz ]

+ Input folder exists
	10 files selected...
	10 samples selected...
	Single end mode selected...
-------------------------
(Time spent: 0 h 0 min 0 s)
-------------------------
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data 
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data/s 
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data/s/raw 
+ Sample files will be linked...
    args.func(args)
  File "/home/sam/programs/mambaforge/envs/XICRA/lib/python3.7/site-packages/XICRA/modules/prep.py", line 225, in run_prep
    os.path.join(outdir_dict[row['new_name']], row['new_file']))
KeyError: 'sample_8'

# ------------------------------ #
XICRA QC -i XICRA_analysis --single_end --threads 4
...


######################################################################
#                           XICRA pipeline                           #
#                   Jose F. Sanchez & Lauro Sumoy                    #
#        Copyright (C) 2019-2022 Lauro Sumoy Lab, IGTP, Spain        #
######################################################################


|==================================================|
|                  Quality check                   |
|==================================================|


--------- Starting Process ---------
	02/01/2024, 12:15:52



|==================================================|
|         FASTQC Quality check for samples         |
|==================================================|


+ Getting files from input folder... 
+ Mode: fastq.
+ Extension: 
[ fastq, fq, fastq.gz, fq.gz ]

+ Input folder exists

**ERROR: No samples were retrieved. Check the input provided

Hi there,
I guess I was doing some changes in a supplementary package where many functions are stored and some modifications might not have been thoroughly tested.

I will have a look, update and fix the bugs and let you know shortly.

Best regards

Thanks so much for the quick response and update. Much appreciated! Looking forward to giving this tool a try!

Hi there,
I have updated and I think it should be working now, please give a try and let me know it it works.

I encourage to create a new and fresh environemnt and follow the steps in https://github.com/HCGB-IGTP/XICRA

## Install

# get environment yml configuration
wget https://raw.githubusercontent.com/HCGB-IGTP/XICRA/master/XICRA_pip/devel/conda/environment.yml

conda env create -f environment.yml

# activate
conda activate XICRA_env

# install latest python code
pip install XICRA

# install missing software
wget https://raw.githubusercontent.com/HCGB-IGTP/XICRA/master/XICRA_pip/XICRA/config/software/installer.sh
sh installer.sh

Get test datasets:

XICRA test

This command downloads three diferent datasets into your directory: Single end, paired end and tRNA enriched dataset. Also, a script named

You can run the sh file as sh test_subset.sh that produces the whole analysis for all datasets or do it step by step either checking the contents of the file or following the XICRA workflow:

# Single end
XICRA prep -i ./subset_SE/ -o XICRA_analysis --single_end 
XICRA QC -i XICRA_analysis --single_end --threads 4 
XICRA trim -i XICRA_analysis --single_end --threads 4 --adapters_a TGGAATTCTCGGGTGCCAAGG
XICRA miRNA -i XICRA_analysis --single_end --threads 4 --software miraligner 
# Paired end
XICRA prep -i ./subset_PE/ -o XICRA_analysis_PE
XICRA QC -i XICRA_analysis_PE --threads 4
XICRA join -i XICRA_analysis_PE --threads 4 --noTrim
XICRA miRNA -i XICRA_analysis_PE --threads 4 --software miraligner

## tRNA single end
XICRA prep -i ./subset_tRNA/ -o XICRA_analysis_tRNA --single_end
XICRA QC -i XICRA_analysis_tRNA --single_end --threads 4 
XICRA tRNA -i XICRA_analysis_tRNA --noTrim --single_end --threads 4 --software mintmap 

Unfortunately, due to the new updates on python and some limitations, optimir has been discarded of the miRNA software possibilities and MINTmap, employed in the tRNA might need further debugging.

Thanks for the update. Seems like this specific error has been resolved!