How to join (merge) data frames (inner, outer, left, right)?
duttashi opened this issue · 1 comments
Given two data frames:
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
df1
# CustomerId Product
# 1 Toaster
# 2 Toaster
# 3 Toaster
# 4 Radio
# 5 Radio
# 6 Radio
df2
# CustomerId State
# 2 Alabama
# 4 Alabama
# 6 Ohio
How can I do database style, i.e., sql style, joins? That is, how do I get:
-
An
inner join
of df1 and df2:
Return only the rows in which the left table have matching keys in the right table. -
An
outer join
of df1 and df2:
Returns all rows from both tables, join records from the left which have matching keys in the right table. -
A
left outer join
(or simplyleft join
) of df1 and df2
Return all rows from the left table, and any rows with matching keys from the right table. -
A
right outer join
of df1 and df2
Return all rows from the right table, and any rows with matching keys from the left table.
Extra credit:
How can I do a SQL style select statement?
Reference
This question was originally asked on stack overflow. I'm reproducing it here for my future reference
Using the dplyr
package, this can be done as follows;
library(dplyr)
> inner_join(df1, df2)
Joining, by = "CustomerId"
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
> #left outer
> left_join(df1, df2)
Joining, by = "CustomerId"
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio
> #right outer
> right_join(df1, df2)
Joining, by = "CustomerId"
CustomerId Product State
1 2 Toaster Alabama
2 4 Radio Alabama
3 6 Radio Ohio
> #alternate right outer
> left_join(df2, df1)
Joining, by = "CustomerId"
CustomerId State Product
1 2 Alabama Toaster
2 4 Alabama Radio
3 6 Ohio Radio
> #full join
> full_join(df1, df2)
Joining, by = "CustomerId"
CustomerId Product State
1 1 Toaster <NA>
2 2 Toaster Alabama
3 3 Toaster <NA>
4 4 Radio Alabama
5 5 Radio <NA>
6 6 Radio Ohio