Erin-Rooney/XCT-freezethaw

Left join issue

Closed this issue · 12 comments

@kaizadp

I'm doing something weird. I think maybe I'm forgetting something? I'm just trying to combine the before and after data subsets so I can subtract the before frequencies from the after frequencies but I'm having trouble getting them to combine. I keep getting an error:

Error: vector size cannot be NA/NaN
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
In character("sample") : NAs introduced by coercion

diff = before %>%
left_join(dplyr::select(after, after_freq, sample, by = character("sample"), "breadth_um"))

Two main issues:

  1. the select does not have a closed parenthesis
  2. where does the "breadth_um" fit in this?

I suggest trying:
left_join(dplyr::select(after, after_freq, sample), by = character("sample"))

I have found this approach easier to manage the parentheses:
left_join(after %>% dplyr::select(after_freq, sample), by = character("sample"))

  1. if you're trying to join by breadth_um, (a) that needs to go into the select parentheses, and (b) by needs a c()
  2. I have never tried by = character(sample) like you've done. Could that be causing problems? If the above edits still give you an error, try without the character().

If this still doesn't work, I'll do a PR and run the code myself.

That worked!

I'm trying to join by sample but I want it to line up by breadth_um as well. Will it do that automatically?

  • you'll need to include breadth_um in the select() argument too then
  • then if you set by = "sample" only, it will not join by breadth_um. You'll get two columns, breadth_um.x and breadth_um.y
  • you'll need to include breadth_um in the. by = c() argument
  • or, you could drop the entire by = piece and force it to automatically join. it will look for all columns that are common to both data files and join using them all. in this case, sample and breadth_um.

So when I got rid of the by = piece, it automatically selected sample in the output.

Ex:

  • left_join(after %>% dplyr::select(after_freq, sample)) %>%
  • dplyr::mutate(diff_freq = (after_freq - before_freq))
    Joining, by = "sample"

So I tried the by = c() argument and I got the following error:

  • left_join(after %>% dplyr::select(after_freq, sample), by = c("sample", "breadth_um")) %>%
  • dplyr::mutate(diff_freq = (after_freq - before_freq))
    Error: Join columns must be present in data.
    x Problem with breadth_um.

Weirdly, if I left_join by "sample" then it does not give me the two breadth_um columns, even though I added breadth_um to the select command. So I'm not sure what's happening with that column...

diff = before %>%
left_join(after %>% dplyr::select(after_freq, sample, breadth_um)) %>%
dplyr::mutate(diff_freq = (after_freq - before_freq))

diff2 = before %>%
left_join(after %>% dplyr::select(after_freq, sample), by = c("sample", "breadth_um")) %>%
dplyr::mutate(diff_freq = (after_freq - before_freq))

Can you run head(before) and head(after) and attach screenshots of the output?

Hmm. Try running line 252 again. Your script has breadth_um in the select(), but if you check the error message, it’s missing breadth_um.
is it possible you ran an old piece of code?

Okay, weird. I just ran 252 again and this was the output:

  • left_join(after %>% dplyr::select(after_freq, sample, breadth_um)) %>%
  • dplyr::mutate(diff_freq = (after_freq - before_freq))
    Joining, by = c("sample", "breadth_um")

So...fixed?

Looks like it ran fine.
If the console has the blue > symbol, it means the code worked and it’s ready for the next piece of code.
also check the output file to see if they merged correctly.

The output file still doesn't have two breadth_um columns. Probably because it automatically joined by breadth_um as well as sample.

image

It gave me a much more normal graph that fits way better with my understanding of my data, though.

Before fixed:

image

After fixed:
image

Oh yeah, the second figure looks much better (and more correct).
And yes, you should have only the one "breadth_um" column, because you used that to join.