hgascon/pulsar

Pulsar fails on certain PCAP files

mlucas300 opened this issue · 2 comments

Hi,

I have tested Pulsar on 2 PCAP files: one 3.9 GB (https://download.netresec.com/pcap/maccdc-2011/maccdc2011_00010_20110312194033.pcap.gz) and one 1.4 GB (not publicly available). The smaller one runs to completion but the larger one does not, with the following error:

Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
index larger than maximal 0
Calls: loadPrismaData ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_rows -> intI
Execution halted
Error during clustering (not enough data?)
Cluster file not generated: ~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster
Exiting learning module...

Larger file output:

> # reading arguments
> cmd_args<- commandArgs(TRUE)
> prisma_dir<-cmd_args[1]
> capture_dir<-cmd_args[2]
> clusters_file<-cmd_args[3]
> nmf_ncomp<-cmd_args[4]
> print(cmd_args)
[1] "modules/PRISMA/R"                                                          
[2] "~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011"        
[3] "~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster"
[4] "0"                                                                         
> 
> # store the current directory
> initial_dir<-getwd()
> 
> # load necessary libraries
> # library(PRISMA)
> library(Matrix)
> 
> # change to prisma src dir and load scripts
> setwd(prisma_dir) 
> source("prisma.R")
> source("dimensionEstimation.R")
> source("matrixFactorization.R") 
> setwd(initial_dir)
> 
> # load the dataset
> data = loadPrismaData(capture_dir)
Reading data...
Splitting ngrams...
Calc indices...
Setup matrix...
to check: 2 
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
  index larger than maximal 0
Calls: loadPrismaData ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_rows -> intI
Execution halted
Error during clustering (not enough data?)
Cluster file not generated: ~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster
Exiting learning module...

Smaller file output:

 > # reading arguments
> cmd_args<- commandArgs(TRUE)
> prisma_dir<-cmd_args[1]
> capture_dir<-cmd_args[2]
> clusters_file<-cmd_args[3]
> nmf_ncomp<-cmd_args[4]
> print(cmd_args)
[1] "modules/PRISMA/R"                                              
[2] ~/Documents/Fuzzing/pulsar/models/test/test"        
[3] "~/Documents/Fuzzing/pulsar/models/test/test.cluster"
[4] "0"                                                             
> 
> # store the current directory
> initial_dir<-getwd()
> 
> # load necessary libraries
> # library(PRISMA)
> library(Matrix)
> 
> # change to prisma src dir and load scripts
> setwd(prisma_dir) 
> source("prisma.R")
> source("dimensionEstimation.R")
> source("matrixFactorization.R") 
> setwd(initial_dir)
> 
> # load the dataset
> data = loadPrismaData(capture_dir)
Reading data...
Splitting ngrams...
Calc indices...
Setup matrix...
to check: 551 
to check: 518 
to check: 480 
to check: 479 
to check: 478 
to check: 476 
to check: 455 
to check: 430 
to check: 406 
to check: 404 
to check: 366 
to check: 365 
to check: 346 
to check: 345 
to check: 320 
to check: 319 
to check: 317 
to check: 266 
to check: 264 
to check: 262 
to check: 261 
to check: 241 
to check: 240 
to check: 238 
to check: 221 
to check: 206 
to check: 200 
to check: 198 
to check: 191 
to check: 190 
to check: 171 
to check: 168 
to check: 162 
to check: 157 
to check: 156 
to check: 155 
to check: 154 
to check: 153 
to check: 149 
to check: 148 
to check: 147 
to check: 146 
to check: 145 
to check: 143 
to check: 141 
to check: 139 
to check: 138 
to check: 132 
to check: 130 
to check: 129 
to check: 120 
to check: 119 
to check: 118 
to check: 116 
to check: 114 
to check: 113 
to check: 111 
to check: 110 
to check: 109 
to check: 106 
to check: 105 
to check: 104 
to check: 103 
to check: 102 
to check: 100 
to check: 99 
to check: 98 
to check: 97 
to check: 96 
to check: 95 
to check: 93 
to check: 91 
to check: 87 
to check: 86 
to check: 83 
to check: 81 
to check: 79 
to check: 78 
to check: 76 
to check: 75 
to check: 74 
to check: 73 
to check: 72 
to check: 69 
to check: 68 
to check: 65 
to check: 63 
to check: 62 
to check: 60 
to check: 59 
to check: 58 
to check: 57 
to check: 56 
to check: 55 
to check: 54 
to check: 53 
to check: 52 
to check: 50 
to check: 48 
to check: 47 
to check: 46 
to check: 45 
to check: 44 
to check: 43 
to check: 42 
to check: 41 
to check: 40 
to check: 38 
to check: 37 
to check: 36 
to check: 35 
to check: 33 
to check: 32 
to check: 31 
to check: 30 
to check: 29 
to check: 28 
to check: 27 
to check: 26 
to check: 24 
to check: 23 
to check: 22 
to check: 21 
to check: 20 
to check: 19 
to check: 18 
to check: 15 
to check: 13 
to check: 12 
to check: 11 
to check: 10 
to check: 8 
to check: 7 
to check: 6 
to check: 5 
to check: 4 
to check: 3 
to check: 2 
to check: 1 
> 
> # estimate number of components
> #dim = calcEstimateDimension(data$unprocessed)
> #cat("Estimated dimension:", estimateDimension(dim), "\n")
> #ncomp = estimateDimension(dim)
> 
> dimension = estimateDimension(data)
> ncomp <- dimension[[2]]
> if (ncomp == 0) {
+     ncomp <- strtoi(nmf_ncomp)
+ }
> print (ncomp)
[1] 26
> # find NMF decomposition
> pmf = prismaNMF(data, ncomp)
Error: 769.1271 
Error: 732.2996 
Error: 728.6931 
Error: 703.4451 
Error: 703.8828 
Error: 705.781 
Error: 688.984 
Error: 688.8207 
> 
> #compute and write clusters to a file
> clusters = calcDatacluster(pmf)
> write.table(clusters, clusters_file, row.names=FALSE, col.names=FALSE)
> 
Colouring 21 states:
14.UAC|18.UAS
START|8.UAC
START|13.UAC
START|12.UAC
None.UAS|None.UAS
None.UAS|None.UAC
None.UAC|None.UAS
START|4.UAC
START|5.UAC
26.UAS|26.UAC
START|None.UAC
START|26.UAC
8.UAC|26.UAS
None.UAC|None.UAC
START|1.UAC
START|19.UAC
START|22.UAC
26.UAC|26.UAS
26.UAS|8.UAC
START|2.UAC
START|24.UAC

Hi @mlucas300, both files are for sure large enough but, do they contain traffic only between two unique IP:PORTs? If not, try to filter the communication channel you are interested in by using tcpdump and try again.

Hi mlucas300, Have you tried what @hgascon has suggested to you to maintain a set of unique source and destination ip address along with unique ports. Have you succeeded in implementing it? I got stuck here in my implementation.