Nitin Agrawal
nitina@cs.wisc.edu
Run make
to install, then run ./impress <inputfile>
. The default input file is ./inputfile
.
Additional software is required in extension_helpers
for certain file extensions:
- id3v2 for mp3
- gif for gif/jpeg files
Parent_Path: ./impress_home/ 1
Specifies where to create the test file system. 1/0 toggles whether this will be used or not.Actuallogfile: /root/impress/Desksearch/Results/Logs 0
Specifies where log files will be maintained. 1/0 toggles whether this will be used or not.Randseeds: Direct 10 10 20 30 40 50 60 70 80 90 100
Specify seeds for random number generators to ensure reproducibility. If more randseeds are needed, change the number right afterDirect
which specifies the number of random seeds.FScapacity: 100 GB
Total capacity of the disk; does not affect anything.FSused: 45 GB
Size of the desired file system. The actual created file system can be a little off from the desired value.Numfiles: NO K
Number of files desired. First arg is number, second arg is unit:N
for 10^0,K
for 10^3,M
for 10^6,B
for 10^9. If first arg isNO
, Impressions will calculate based onFSused
.Numdirs: NO K
Number of directories desired. First arg is number, second arg is unit:N
for 10^0,K
for 10^3,M
for 10^6,B
for 10^9. If first arg isNO
, Impressions will calculate based onFSused
.Filesperdir: 10.2 N
Mean number of files in a directory.FilesizeDistr: Direct 3 99994 29 0.91
File size distribution. 3 is number of params. First arg is bias for lognormal out of 100000; second arg is exponent for Pareto base (Xm); third arg is Pareto shape (alpha).FilecountDistr: Direct 2 9.48 2.45663283
File count/size lognormal distribution. First arg is mu; second arg is sigma.Dircountfiles: Direct 2 2 2.36
Directory sizes inverse polynomial distribution. First arg is degree, second arg is offset.DirsizesubdirDistr: Indir DirsizesubdirDistr.txt
Fileswithdepth: Direct 10
File depth Poisson distribution. Arg is lambda (mean).Layoutscore: 1.0
Desired layout score. 1.0 is perfectly laid out.Actualfilecreation: 1
If this is 0 then no files or dirs will be created. If this is 1 then files and dirs will be created. If this is 2 then only dirs will be created.
SpecialFlags: 10
: Number of flags there are.Flat 0
: If 1, create a flat tree.Deep 0
: If 1, create a deep tree.Ext -1
: If -1, Impressions select extensions; otherwise, the provided number selects from the list of extensions inextension.cpp
.Wordfreq 0
: The type of content inside files.Large2Small 0
: Not currently used.Small2Large 0
: Not currently used.Depthwithcare 1
: If 1, place files carefully according to depth distributions.Filedepthpoisson 1
: If 1, use the Poisson distribution choice for file depth.Dircountfiles 1
: If 1, use inverse polynomial distribution choice for number of files in directory.Constraint 0
: If 1, activate constraint solving forFSused
andNumfiles
.
Printwhat
is the number of printing flags there are; use 0/1 to toggle off/on.
Printwhat: 10
ext 0
sizebin 0
size 0
initial 0
final 0
depth 0
tree 0
subdirs 0
dircountfiles 0
constraint 0
SpecialDirBias
The shape of the file count and base 2 log bytes by file depth distributions are hard-coded into this file as arrays. The constants Total_depthcount_prob
, depthcount_prob
, and depth_meansize
have been modified to suit my own data; the original values are from a five-year study of Windows filesystems, so they're probably far more legit.
In the montecarlo
function in montecarlo.cpp
appears this nested loop:
list<dir> LD;
list<dir>::iterator ni;
...
int montecarlo(int numdirs) {
...
LD.push_front(Dirs[0]);
...
for (long i = 1; i < numdirs; i++) {
long token_uptil_now = 0, sum_childs_plus2 = 2;
long token = (rand() % sum_childs_plus2) + 1;
ni = LD.begin();
token_uptil_now += (*ni).subdirs+2;
while (token_uptil_now < token) {
ni++;
token_uptil_now+= (*ni).subdirs+2;
}
...
LD.push_back(Dirs[i]);
sum_childs_plus2+=2+1;
...
}
...
}
This is very slow when numdirs
is large, mostly due to the while
loop that walks the iterator. For now, I replace this with a direct access into the middle of the iterator. This requires an iterator capable of random access, but at the same time the container being iterated still needs the capability to have elements pushed onto its front and back, so I replace the list with a dequeue. There is probably a more proper fix to this, but likely requires actually understanding the Monte Carlo simulation code.
deque<dir> LD;
deque<dir>::iterator ni;
...
int montecarlo(int numdirs) {
...
for (long i = i; i < numdirs; i++) {
...
ni = LD.begin();
ni += token / 3;
...
}
...
}