Left join issue
Closed this issue · 12 comments
I'm doing something weird. I think maybe I'm forgetting something? I'm just trying to combine the before and after data subsets so I can subtract the before frequencies from the after frequencies but I'm having trouble getting them to combine. I keep getting an error:
Error: vector size cannot be NA/NaN
Run rlang::last_error()
to see where the error occurred.
In addition: Warning message:
In character("sample") : NAs introduced by coercion
XCT-freezethaw/code/ER-23-porethroatdist.R
Lines 252 to 253 in 76886cf
Two main issues:
- the
select
does not have a closed parenthesis - where does the "breadth_um" fit in this?
I suggest trying:
left_join(dplyr::select(after, after_freq, sample), by = character("sample"))
I have found this approach easier to manage the parentheses:
left_join(after %>% dplyr::select(after_freq, sample), by = character("sample"))
- if you're trying to join by breadth_um, (a) that needs to go into the select parentheses, and (b)
by
needs ac()
- I have never tried
by = character(sample)
like you've done. Could that be causing problems? If the above edits still give you an error, try without the character().
If this still doesn't work, I'll do a PR and run the code myself.
That worked!
I'm trying to join by sample but I want it to line up by breadth_um as well. Will it do that automatically?
- you'll need to include breadth_um in the select() argument too then
- then if you set
by = "sample"
only, it will not join by breadth_um. You'll get two columns, breadth_um.x and breadth_um.y - you'll need to include breadth_um in the.
by = c()
argument - or, you could drop the entire
by =
piece and force it to automatically join. it will look for all columns that are common to both data files and join using them all. in this case, sample and breadth_um.
So when I got rid of the by = piece, it automatically selected sample in the output.
Ex:
- left_join(after %>% dplyr::select(after_freq, sample)) %>%
- dplyr::mutate(diff_freq = (after_freq - before_freq))
Joining, by = "sample"
So I tried the by = c() argument and I got the following error:
- left_join(after %>% dplyr::select(after_freq, sample), by = c("sample", "breadth_um")) %>%
- dplyr::mutate(diff_freq = (after_freq - before_freq))
Error: Join columns must be present in data.
x Problem withbreadth_um
.
Weirdly, if I left_join by "sample" then it does not give me the two breadth_um columns, even though I added breadth_um to the select command. So I'm not sure what's happening with that column...
XCT-freezethaw/code/ER-23-porethroatdist.R
Lines 252 to 254 in ae36699
XCT-freezethaw/code/ER-23-porethroatdist.R
Lines 256 to 258 in ae36699
Can you run head(before) and head(after) and attach screenshots of the output?
Hmm. Try running line 252 again. Your script has breadth_um in the select(), but if you check the error message, it’s missing breadth_um.
is it possible you ran an old piece of code?
Okay, weird. I just ran 252 again and this was the output:
- left_join(after %>% dplyr::select(after_freq, sample, breadth_um)) %>%
- dplyr::mutate(diff_freq = (after_freq - before_freq))
Joining, by = c("sample", "breadth_um")
So...fixed?
Looks like it ran fine.
If the console has the blue > symbol, it means the code worked and it’s ready for the next piece of code.
also check the output file to see if they merged correctly.
Oh yeah, the second figure looks much better (and more correct).
And yes, you should have only the one "breadth_um" column, because you used that to join.