jdurbin/wekaMine

FoldSets potentially match partial attribute names

Opened this issue · 0 comments

When attributes have names that are subsets of other attribute names, FoldSets can get the wrong fold set... for example TP53 is a subset of TP53A, so if those are two different attributes foldsets may match the second when searching for the first. Need to fix match to be more precise than "contains". The regex matching below may do the trick, need to verify and make the change.

  FoldSets getFoldSetsForAttribute(Attribute a){
  •           err.println "allfoldsets: "+this.map
    
  •         // Get foldsets matching fold\d+
            def foldKeys = map.keySet().grep(~/fold\d+/)
            def foldMap = map.subMap(foldKeys)
    
            // Get foldsets matching attribute name
            def attributeName = a.name()
    
  •           def attributeKeys = map.keySet().grep(attributeName)
    
  •           // Attribute folds can be attribute name _Rep01 _Rep02, etc.
    
  •           // This keeps us from matching attributes that are subsets of others
    
  •           // e.g. TP53Syn and TP53SynPlus the second should not match when 
    
  •           // the attribute is the first. 
    
  •           def attributePattern = ~/($attributeName)(_Rep\d+)*/
    
  •           def allkeys = map.keySet()
    
  •           def attributeKeys = allkeys.grep(attributePattern)
    
  •           err.println "DEBUG allkeys = "+allkeys
    
  •           err.println "DEBUG attributeName: $attributeName"
    
  •           err.println "DEBUG grep out $attributeKeys"
    
  •         def attributeMap = map.subMap(attributeKeys)
    
            // Merge the two maps...
            def newMap = attributeMap + foldMap
            def newData = newMap.values() as ArrayList                              
            def newFoldSet = new FoldSets(newMap,newData)
    
  •           err.println "newfoldset: "+newFoldSet?.map
    
  •         return(newFoldSet)
    }