stuart-lab/signac

Not all cells requested could be found in the fragment file

Malindrie opened this issue · 8 comments

I have downloaded the SHARE-seq ATAC-seq data and trying to create ChromatinAssay object:

atac <- Read10X("./GSM4156597_skin.late.anagen_atac/", gene.column = 1)
fragments <- "./GSM4156597_skin.late.anagen.atac.fragments.sorted.bed.gz"

share[['ATAC']] <- CreateChromatinAssay(
   counts = atac,
   sep = c(":", "-"),
   genome = "mm10",
   fragments = fragments
)

I get the following error:

Computing hash
Checking for 34774 cell barcodes
Error in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,  : 
  Not all cells requested could be found in the fragment file.
)

However, when I read in all the barcodes from the fragments file manually and compared them to barcodes of the ATAC counts matrix, all barcodes from counts are in the fragments, although there are more barcodes in fragments file.

I see that a similar issue was raised before, but even setting max.lines = NULL and tolerance = 0 to perform an exhaustive search, still resulted in the same error as above.

Greatly appreciate if you could let me know what I might be doing wrong.

In the count matrix, cell barcodes have the format like: R1.01.R2.01.R3.06.P1.55. In the fragment file cell barcodes have the format like: R1.52,R2.48,R3.53,P1.05. So they do not match (comma versus period). The SHARE-seq fragment file on GEO is also not sorted and bgzip-compressed. I'd suggest replacing all the commas in the fragment file with periods, then sorting, compress with bgzip, and index with tabix. You should then be able to use the file with Signac.

Thank you for the reply.

I have already done all above suggestions (replacing all the commas in the fragment file with periods, then sorting, compress with bgzip, and index with tabix), but is getting the same error. I could email a Dropbox folder link to the dataset (dataset in required format for input to Signac), if it helps?

Greatly appreciate any help.

Sure, you can email a link to tstuart@nygenome.org and I will take a look

In this case the error message was misleading, the real issue was that there is a column missing in the fragment file. I have added a check for the correct number of columns in CreateFragmentObject which will now throw a more informative error message.

If you add the 5th column to the fragment file (just add 1 for all rows), then it should work as expected. eg:

gzip -d GSM4156597_skin.late.anagen.atac.fragments.sorted.bed.gz
awk 'BEGIN {FS=OFS="\t"} {print $0, 1}' GSM4156597_skin.late.anagen.atac.fragments.sorted.bed > frags.bed
bgzip -@ 10 frags.bed
tabix -p bed frags.bed.gz

i met the same problem , how did you solve it? give me help, please!
fragments <- 'atac_v1_pbmc_10k_fragments.tsv.gz'

chrom_assay <- CreateChromatinAssay(

  • counts = counts,
  • sep = c(":", "-"),
  • genome = 'hg19',
  • fragments = fragments ,
  • min.cells = 10,
  • min.features = 200,
  • )
    Computing hash
    can't open fileError in CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments, :
    Not all cells requested could be found in the fragment file.

@XinshuXie please open a new issue including the full code and output of sessionInfo()

Sorry to reopen this, but I get the same error,
"Not all cells requested could be found in the fragment file",
when I try to create a chromatin assay from cellrnger-arc output,
all_cells_seurat[["ATAC"]] <- CreateChromatinAssay(
counts = counts$Peaks,
sep = c(":", "-"),
fragments = fragpath,
annotation = annotation
).
I try to solve it by removing barcodes that are not in the fragments file by
fileName ="./atac_fragments.tsv.gz"
tokeep <- scan(fileName, quote = "", what = list(NULL, NULL, NULL, name = character(), NULL), skip = 18)
print(tokeep$name)
print(length(colnames(all_cells_seurat)) )
all_cells_seurat <- all_cells_seurat[,colnames(all_cells_seurat) %in% tokeep$name]
but it doesn't work.
I thought that the error message indicates that there are more cells in the RNA assay than in the fragments file. So the above should work, shouldn't it?