earowang/hts

Extension of createNotes() to work with levels specified by seperators

panoptikum opened this issue · 6 comments

Hello,

sorry for my novice behaviour regarding pull requests and so on.

I'm opening an issue for this as I've announced:

It would be great if nodes could be specified by a separator such as an underscore.

This way the function could handle different length of nodes names.

My current solution looks for hts.R can be found in my fork repo. This time I only use function from base R:

panoptikum@1d1ee22

I've tested with the examples:

abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("AA_100_A_172", "AA_100_A_172", "A_10_C_A", "A_2_B_21", "A_2_B_DA","B30_A_1_H", "B30_B_3_Z", "B30_B_1_%", "B_40_A_2", "B_40_A_3")
y <- hts(abc, characters = c(1, 2, 1), sep="_")

and

abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("AA_100_A", "AA_10_B1Z2", "A_10_C", "A_2_AB", "A_2_B","B30_A_1", "B30_B_3", "B30_CA_1", "B_40_A", "B_40_B")
y <- hts(abc, characters = c(1, 2, 1), sep="_")

gave me for y$nodes:

$`Level 1`
[1] 4

$`Level 2`
 AA   A B30   B 
  2   3   3   2 

$`Level 3`
AA100   A10    A2  B30A  B30B   B40 
    2     1     2     1     2     2 

$`Level 4`
AA100A   A10C    A2B  B30A1  B30B3  B30B1   B40A 
     2      1      2      1      1      1      2 

and respectively:

$`Level 1`
[1] 4

$`Level 2`
 AA   A B30   B 
  2   3   3   2 

$`Level 3`
AA100  AA10   A10    A2  B30A  B30B B30CA   B40 
    1     1     1     2     1     1     1     2 

Best

@panoptikum Hello, I have used your function CreateNodes() to create nodes, but got into the debug mode. Then I tried your function hts(), also got into the debug mode. Could you give a detail example? Thanks.

@jnuvenus Thanks for trying out my function. I forgot to remove a browser() within the function which was there for debug purposes. It should work with the above example now, but I can highlight the crucial code changes as well.
You have to source the whole hts.R file to ensure functionality or load my fork of HTS.

Let me know, if it works or not.

@panoptikum Finally, I use hive sql to calculate the num of every level of the nodes, and then use R arrange() fun to sort them。

@jnuvenus Well, I'm sure others would like to stay within R and this package, but I'm pleased to hear that you've found a working solution for you.

@panoptikum Hello, I use CreateNodes() in your new hts() function,
cols <- c('yCN01_y755Y_y755AC','yCN01_y755Y_y755AG','yCN01_y023Y_y023Y00001','yCN02_y010A_y010AAC')
gtnode <- CreateNodes(bnames=cols, characters = c(1, 2, 1), sep="_")$nodes
then, I got result as follows,
[[1]]
[1] 2

[[2]]
yCN01 yCN02
3 1

[[3]]
yCN01y755Y yCN01y023Y yCN02y010A
2 1 1
This result have one error, the num of yCN01 should be 2.
So, I fixed the bug in cnt count part in your CreateNodes() ,
cnt <- sapply(x_1, function(z) {
vec1 <- sapply(bnames_split, function(i) {paste(c(paste(i[1:(x-1)], collapse = ""),i[x]), collapse = "")})
vec1 <- unique(vec1)
vec <- sapply(vec1, function(j){strsplit( j,"
")[[1]][1]==z})
sum(vec, na.rm = TRUE)
})
then I got result as follows,
[[1]]
[1] 2

[[2]]
yCN01 yCN02
2 1

[[3]]
yCN01y755Y yCN01y023Y yCN02y010A
2 1 1

Maybe you can try this fix on other examples.

Is there any news in terms of this functionality?
I agree, it's so painful to prepare groups. I had to write some functions that make new column in a DF as a key with fixed length where it defines the length of each group.

If I have 3 groups it finds max symbol length of each column and add some symbols in the end of each row so that it has the same length as the longest in the column. Then it concatinates 3 colums, Pivot by the concatinated key to columns and passes to HTS also giving the length of each group (max character length).