immunomind/immunarch

Problem with importing columns that are mostly NAs

christianwoe opened this issue ยท 3 comments

Hi all,

๐Ÿ› Bug

I was trying to import some data from MiXCR 4.3.1 tsv files and recognized warning messages for some of the samples.
After further checking it seems that in rare cases columns are assigned to type logical even if there are cases where character content is present for some of the clones. However, those cases are replaced by 'NA' and therefore the information is discarded.
It looks for me like the readr function in inside repLoad is guessing the wrong type of the column,
probably because it only checks a subset of rows.

It would be helpful to be able to modify the parameter provided to the readr function, either 'col_types' or 'guess_max'. Or is there already another solution?

To Reproduce

Steps to reproduce the behavior:

  1. repLoad(pathname)

This is the warning message.

Warning: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat)

Expected behavior

Columns with at least 1 non-NA are not assigned to type logical.

Many thanks and kind regards,
Christian

Hi @christianwoe

Thank you for opening the issue. Could you share an example of such data please? What columns are usually the problematic ones?

I'm open to scheduling a short call to discuss this issue over Zoom if this accelerates things.

Hi,
here is an example based on test data where I think the 'allDHitsWithScore' is causing a warning, because only one of all the clonotypes has an assigned value here.

Best wishes,
Christian

Multi_TRA_FS115_2_S150.clones_TRAD.tsv.zip

Hey everyone, I'm facing a similar issue here - was this fixed in the latest update? Cheers, Nicole