correct converting dataframe into transactions for arules in R
jasperDD opened this issue · 2 comments
I must performing association rules in R and i found the example
here
http://www.salemmarafi.com/code/market-basket-analysis-with-r/
In this example they work with data(Groceries)
but they gave original dataset Groceries.csv
structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L,
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns",
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles",
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles",
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda",
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles",
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices",
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice",
"meat,citrus fruit,berries,root vegetables,whole milk,soda",
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages",
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum",
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins",
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles",
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA,
-19L))
i load this data
g=read.csv("g.csv",sep=";")
so i must convert it to transactions like arule requires
#'@importClassesFrom arules transactions
trans = as(g, "transactions")
lets' examinate data(Groceries)
> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
>
and my converted data from original csv
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 3 variables:
.. ..$ labels : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
.. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ levels : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
..@ itemsetInfo:'data.frame': 9835 obs. of 1 variable:
.. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
>
We see that in data(Groceries)
transactions in sparse format with
9835 transactions (rows) and
169 items (columns)
in my trans data
9835 transactions (rows) and
7011 items (columns)
i.e. i got 7011 columns from Groceries.csv, meanwhile in embedded example(169 columns)
Why it is so? How this file convert correct.
I must understand it, cause, i can't work with my file
The data you show does not look like a valid CSV file. Have a look at ? read.transactions
to learn how to read in files.
Hi, mhahsler, i found decision here
trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',')
thank you