/churn

Memory churn application to test limits of OpenJDK garbage collectors

Primary LanguageJava

Building
--------

You build the churn application using mvn package.

Running
-------

There are scripts in the bin directory to run it with the various GCs
available in Hotspot/OpenJDK

  run-g1.sh
  run-cms.sh
  run-par.sh
  run-shenandoah.sh

The scripts dump a suitably labelled gc log and output file showing
the gc trace output and the program output e.g. if you execute
run-g1.sh the fiels will be

  gclog-g1
  outlog-g1

You can also run with the compressed oops optimization enabled using
the following scripts

  run-g1-nocoops.sh
  run-cms-nocoops.sh
  run-par-nocoops.sh
  run-shenandoah-nocoops.sh

Arguments
---------

The arguments to the program (and the run scripts) are optional in
which case the program uses suitable defaults

  -blocks B [default 4] how many data blocks to allocate per work item
  -items I [default 4] how many million local/global work items in the work set
  -threads T [default 8] how many threads to use to do the processing
  -iterations N [default 200] how many times to update the local/gobal work set
  -duration D [default off] how long in seconds should churn run. overwrites -iterations
  -computations C [default 32] how many computes/write operations are
   performed on each allocated object 
  -slices S [default 100] number of allocate/compute operations per timed task
  -yield Y [default -1] yield (Y = 0) or sleep (for Y msecs) at end of slice
  

where B, I, T, N, D, C and S need to be supplied as positive integers
and Y may be 0 or a positive.

Output
------

The program output includes a total time for execution of the test
program and a histogram output showing the distribution of task
execution times for each nominal 'task' run by the program. A task
comprises a certain, configurable number of allocate and compute
operations on a working set of objects.

Operation
---------

The test program repeatedly updates a local (i.e. short lived) work
set by allocating and inserting work items. So, this work set is
continually being updated and most of the items are dropped with a
relatively short life time.

1 in 10 work items are also inserted into the global (long lived) work
set as well as the lcoal set. These items therefore tend to have a
lifetime approximately 10x > than the normal lifetime.

The local and global work sets (eventually) contain I million items
are each divided into T chunks updated independently by each of the T
threads. So, modifying the thread count does not affect the total work
load or data set size.

Each thread loops repeating updates to the work set N times. So, the
total number of allocations and insertions into the local work set
accumulated across all threads is (I * N) million and each thread
creates (I / T) * N items. Alternatively, when D is provided, new
iterations are started, with same rules mentioned, until running
time exceeds D seconds.

Each time a chunk of I / T items is updated there is a 1 in 100 chance
that the thread will purge all its entries from the global work set,
simulating an application phase change.

Items normally hold B 32 byte blocks. 1 in 100 items stores 2 1Kb
blocks and 1 in 200 items stores 1 32kb block. 1 in 1000 items holds
a single 1Mb block.

Items are also linked randomly with an average chain length of ~ 1.3.

Each allocated work item is modified by computing and writing C byte
values to the allocated byte blocks, cycling round to the start of the
block if necessary. So, by increasing C you can vary the allocation to
computation ratio.

The current msec time is measured after every S work item allocate &
compute operations i.e. this parameter groups a fixed number of work
item operations as a standard task whose time is measured repeatedly
over the run. Differences between sucessive timings are recorded as a
per task execution time. These timings will be fairly consistent
*except* where GC pauses thread execution.

By configuring S and C you can ensure that the sample interval is no
greater than a few milliseconds (by dfeault it will probably be much
less than this). Variation in the readings will be displayed in the
output histograms (per thread and accumulated across threads)
providing an indication of how often and how long new gen and old gen
GCs have interrupted execution of tasks.

If you are running with a lot more mutator threads than cores you may
want to simulate yields or pauses during task execution using the
-yields parameter (running with no pauses and a low compute time
merely serves to thrash the memory system). If Y is 0 then a thread
yields at the end of each slice. If Y is > 0 then it sleeps for Y
msecs. Note that yield and sleep times are not included in the next
task's time measurement.