Pulsar fails on certain PCAP files
mlucas300 opened this issue · 2 comments
Hi,
I have tested Pulsar on 2 PCAP files: one 3.9 GB (https://download.netresec.com/pcap/maccdc-2011/maccdc2011_00010_20110312194033.pcap.gz) and one 1.4 GB (not publicly available). The smaller one runs to completion but the larger one does not, with the following error:
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
index larger than maximal 0
Calls: loadPrismaData ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_rows -> intI
Execution halted
Error during clustering (not enough data?)
Cluster file not generated: ~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster
Exiting learning module...
Larger file output:
> # reading arguments
> cmd_args<- commandArgs(TRUE)
> prisma_dir<-cmd_args[1]
> capture_dir<-cmd_args[2]
> clusters_file<-cmd_args[3]
> nmf_ncomp<-cmd_args[4]
> print(cmd_args)
[1] "modules/PRISMA/R"
[2] "~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011"
[3] "~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster"
[4] "0"
>
> # store the current directory
> initial_dir<-getwd()
>
> # load necessary libraries
> # library(PRISMA)
> library(Matrix)
>
> # change to prisma src dir and load scripts
> setwd(prisma_dir)
> source("prisma.R")
> source("dimensionEstimation.R")
> source("matrixFactorization.R")
> setwd(initial_dir)
>
> # load the dataset
> data = loadPrismaData(capture_dir)
Reading data...
Splitting ngrams...
Calc indices...
Setup matrix...
to check: 2
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
index larger than maximal 0
Calls: loadPrismaData ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_rows -> intI
Execution halted
Error during clustering (not enough data?)
Cluster file not generated: ~/Documents/Fuzzing/pulsar/models/maccdc2011/maccdc2011.cluster
Exiting learning module...
Smaller file output:
> # reading arguments
> cmd_args<- commandArgs(TRUE)
> prisma_dir<-cmd_args[1]
> capture_dir<-cmd_args[2]
> clusters_file<-cmd_args[3]
> nmf_ncomp<-cmd_args[4]
> print(cmd_args)
[1] "modules/PRISMA/R"
[2] ~/Documents/Fuzzing/pulsar/models/test/test"
[3] "~/Documents/Fuzzing/pulsar/models/test/test.cluster"
[4] "0"
>
> # store the current directory
> initial_dir<-getwd()
>
> # load necessary libraries
> # library(PRISMA)
> library(Matrix)
>
> # change to prisma src dir and load scripts
> setwd(prisma_dir)
> source("prisma.R")
> source("dimensionEstimation.R")
> source("matrixFactorization.R")
> setwd(initial_dir)
>
> # load the dataset
> data = loadPrismaData(capture_dir)
Reading data...
Splitting ngrams...
Calc indices...
Setup matrix...
to check: 551
to check: 518
to check: 480
to check: 479
to check: 478
to check: 476
to check: 455
to check: 430
to check: 406
to check: 404
to check: 366
to check: 365
to check: 346
to check: 345
to check: 320
to check: 319
to check: 317
to check: 266
to check: 264
to check: 262
to check: 261
to check: 241
to check: 240
to check: 238
to check: 221
to check: 206
to check: 200
to check: 198
to check: 191
to check: 190
to check: 171
to check: 168
to check: 162
to check: 157
to check: 156
to check: 155
to check: 154
to check: 153
to check: 149
to check: 148
to check: 147
to check: 146
to check: 145
to check: 143
to check: 141
to check: 139
to check: 138
to check: 132
to check: 130
to check: 129
to check: 120
to check: 119
to check: 118
to check: 116
to check: 114
to check: 113
to check: 111
to check: 110
to check: 109
to check: 106
to check: 105
to check: 104
to check: 103
to check: 102
to check: 100
to check: 99
to check: 98
to check: 97
to check: 96
to check: 95
to check: 93
to check: 91
to check: 87
to check: 86
to check: 83
to check: 81
to check: 79
to check: 78
to check: 76
to check: 75
to check: 74
to check: 73
to check: 72
to check: 69
to check: 68
to check: 65
to check: 63
to check: 62
to check: 60
to check: 59
to check: 58
to check: 57
to check: 56
to check: 55
to check: 54
to check: 53
to check: 52
to check: 50
to check: 48
to check: 47
to check: 46
to check: 45
to check: 44
to check: 43
to check: 42
to check: 41
to check: 40
to check: 38
to check: 37
to check: 36
to check: 35
to check: 33
to check: 32
to check: 31
to check: 30
to check: 29
to check: 28
to check: 27
to check: 26
to check: 24
to check: 23
to check: 22
to check: 21
to check: 20
to check: 19
to check: 18
to check: 15
to check: 13
to check: 12
to check: 11
to check: 10
to check: 8
to check: 7
to check: 6
to check: 5
to check: 4
to check: 3
to check: 2
to check: 1
>
> # estimate number of components
> #dim = calcEstimateDimension(data$unprocessed)
> #cat("Estimated dimension:", estimateDimension(dim), "\n")
> #ncomp = estimateDimension(dim)
>
> dimension = estimateDimension(data)
> ncomp <- dimension[[2]]
> if (ncomp == 0) {
+ ncomp <- strtoi(nmf_ncomp)
+ }
> print (ncomp)
[1] 26
> # find NMF decomposition
> pmf = prismaNMF(data, ncomp)
Error: 769.1271
Error: 732.2996
Error: 728.6931
Error: 703.4451
Error: 703.8828
Error: 705.781
Error: 688.984
Error: 688.8207
>
> #compute and write clusters to a file
> clusters = calcDatacluster(pmf)
> write.table(clusters, clusters_file, row.names=FALSE, col.names=FALSE)
>
Colouring 21 states:
14.UAC|18.UAS
START|8.UAC
START|13.UAC
START|12.UAC
None.UAS|None.UAS
None.UAS|None.UAC
None.UAC|None.UAS
START|4.UAC
START|5.UAC
26.UAS|26.UAC
START|None.UAC
START|26.UAC
8.UAC|26.UAS
None.UAC|None.UAC
START|1.UAC
START|19.UAC
START|22.UAC
26.UAC|26.UAS
26.UAS|8.UAC
START|2.UAC
START|24.UAC
Hi @mlucas300, both files are for sure large enough but, do they contain traffic only between two unique IP:PORTs? If not, try to filter the communication channel you are interested in by using tcpdump and try again.
Hi mlucas300, Have you tried what @hgascon has suggested to you to maintain a set of unique source and destination ip address along with unique ports. Have you succeeded in implementing it? I got stuck here in my implementation.