seer-lab/ARC

Implement non-functional fitness score

kevinjalbert opened this issue · 2 comments

The non-functional fitness score measures performance of the evolved software. This being said, we are looking at measuring the time resources used during the execution of the system under test.

We can acquire a large set of statistics using the Linux /usr/bin/time -v command. Unfortunately this will break the cross-platform nature to some extent (we have not tried ARC on Windows nor Mac yet though).

The result of this command looks like the following

    Command being timed: "python2 arc.py"
    User time (seconds): 299.15
    System time (seconds): 20.79
    Percent of CPU this job got: 111%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:46.96
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 89368
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1433714
    Voluntary context switches: 20243
    Involuntary context switches: 16198
    Swaps: 0
    File system inputs: 8
    File system outputs: 3072
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

From this we can extract the maximum memory used with the Maximum resident set size field. I am not so sure memory is of a concern though. We can also figure the real-time of the process running on the CPU using User time + System time. We also have the Wall-time, though other process could effect this and I believe the real-time would be a better measure.

We can then calculate the score like so (we can also take into account avg/min/max values if we want):
non-functional score = (max_memory * m) + (real_time * n) + ((wall_time * p)
Again not too sure if we need wall-time or max memory.

Well I had to revamp how the tester.py worked to correctly track the metrics we want. Currently we can get the min/avg/max of real-time and percent CPU. I figured these would be the most applicable as max memory shouldn't vary and wall-time might be inaccurate due to other testing processes.

The problem right now is I am unsure of how to assign the score (including the actual equation). If we just add values together we'll have to minimize the score (should we just try to maximize or does it seem more natural to minimize run-time).

We decided to only consider real time (actual time, 'wall time') and the number of voluntary context switches.

The worst score still persists, but we now calculate it using the average values instead of max values (to avoid an edge case, when we have strong outliers).

The actual calculation is performed using the following equation:

score = worst_score / [sig_of_t(1-unc_of_t) + sig_of_c(1-unc_of_c)]

where sig_of{t,c}_ is the significant modifier of real time (t) or voluntary context switches (c). This is calculated using the following:

if t > c; sig_of_t = t/c && sig_of_c = c/t
if c > t; sig_of_t = c/t && sig_of_c = t/c

The unc_of{t,c}_ represents the uncertainty or variability of the measure:

unc_of_t = [max(t) - min(t)] / avg(t)
unc_of_c = [max(c) - min(c)] / avg(c)