Building -------- You build the churn application using mvn package. However javac can do just fine, and if no maven is around, the top level run.sh will use it. Running ------- There are scripts in the bin directory to run it with the various GCs available in Hotspot/OpenJDK rung1.sh runcms.sh runpar.sh runparold.sh runshenandoah.sh runzgc.sh runzgcgen.sh The scripts dump a suitably labelled gc log and output file showing the gc trace output and the program output e.g. if you execute run-g1.sh the fiels will be gclog-g1 outlog-g1 You can also run with the compressed oops optimization enabled using the following scripts rung1-nocoops.sh runcms-nocoops.sh runpar-nocoops.sh runparold-nocoops.sh runshenandoah-nocoops.sh runzgc-nocoops.sh runzgcgen-nocoops.sh Next to it, there is also top level run.sh Which is wrapping above bin/* by more reasonable defaults, and/or allows you to run them ALL/defaultgc The top level run.sh can be called as: run.sh <somegc> <someseconds> run.sh "<gc1 gc2...>" (note the quotes) which then iterates through selectedGCs or set ALL to try all GCs eg: run.sh "zgc shenandoah" 36000 which will run 10 hours zgc and then 10 hours of shenandoah. If you JVM do notsupport any of them, it will fail It accepts: HEAPSIZE=3g ITEMS=250 THREADS=2 DURATION=18000 # 5 hours (in seconds)# COMPUTATIONS=64 BLOCKS=16 OTOOL_garbageCollector and JAVA_HOME env variables The variables have priority over arguments The top level run.sh can generate junit-like xml and tapfile at the end, and is compressing all the logs to single archive (they can be huge) Note, that if more then one gc is part of the argument/OTOOL_garbageCollector final enumeration, the DURATION applied to each of them. if you use ALL, the DURATION is split among final set (as you never know how much you will actually run) The top level run.sh is the only runner which can run from custom directory. Arguments --------- The arguments to the program (and the run scripts) are optional in which case the program uses suitable defaults -blocks B [default 4] how many data blocks to allocate per work item -items I [default 4000] how many thousand local/global work items in the work set -threads T [default 8] how many threads to use to do the processing -iterations N [default 200] how many times to update the local/gobal work set -duration D [default off] how long in seconds should churn run. overwrites -iterations -computations C [default 32] how many computes/write operations are performed on each allocated object -slices S [default 100] number of allocate/compute operations per timed task -yield Y [default -1] yield (Y = 0) or sleep (for Y msecs) at end of slice where B, I, T, N, D, C and S need to be supplied as positive integers and Y may be 0 or a positive. Output ------ The program output includes a total time for execution of the test program and a histogram output showing the distribution of task execution times for each nominal 'task' run by the program. A task comprises a certain, configurable number of allocate and compute operations on a working set of objects. Operation --------- The test program repeatedly updates a local (i.e. short lived) work set by allocating and inserting work items. So, this work set is continually being updated and most of the items are dropped with a relatively short life time. 1 in 10 work items are also inserted into the global (long lived) work set as well as the lcoal set. These items therefore tend to have a lifetime approximately 10x > than the normal lifetime. The local and global work sets (eventually) contain I million items are each divided into T chunks updated independently by each of the T threads. So, modifying the thread count does not affect the total work load or data set size. Each thread loops repeating updates to the work set N times. So, the total number of allocations and insertions into the local work set accumulated across all threads is (I * N) million and each thread creates (I / T) * N items. Alternatively, when D is provided, new iterations are started, with same rules mentioned, until running time exceeds D seconds. Each time a chunk of I / T items is updated there is a 1 in 100 chance that the thread will purge all its entries from the global work set, simulating an application phase change. Items normally hold B 32 byte blocks. 1 in 100 items stores 2 1Kb blocks and 1 in 200 items stores 1 32kb block. 1 in 1000 items holds a single 1Mb block. Items are also linked randomly with an average chain length of ~ 1.3. Each allocated work item is modified by computing and writing C byte values to the allocated byte blocks, cycling round to the start of the block if necessary. So, by increasing C you can vary the allocation to computation ratio. The current msec time is measured after every S work item allocate & compute operations i.e. this parameter groups a fixed number of work item operations as a standard task whose time is measured repeatedly over the run. Differences between sucessive timings are recorded as a per task execution time. These timings will be fairly consistent *except* where GC pauses thread execution. By configuring S and C you can ensure that the sample interval is no greater than a few milliseconds (by dfeault it will probably be much less than this). Variation in the readings will be displayed in the output histograms (per thread and accumulated across threads) providing an indication of how often and how long new gen and old gen GCs have interrupted execution of tasks. If you are running with a lot more mutator threads than cores you may want to simulate yields or pauses during task execution using the -yields parameter (running with no pauses and a low compute time merely serves to thrash the memory system). If Y is 0 then a thread yields at the end of each slice. If Y is > 0 then it sleeps for Y msecs. Note that yield and sleep times are not included in the next task's time measurement.