The deadline for this assignment is
22:00 Mon 4 Mar 2019
This repository contains a simple framework for describing and executing
small computational puzzles. Each puzzle is derived from puzzler::Puzzle
,
which provides a number of methods for creating inputs, executing puzzles,
and checking outputs. Of particular interest to you is the puzzler::Puzzle::Execute
method, which is the method that you need to accelerate, and is the function
that is timed for performance purposes. Each input problem has an associated
"scale", which is a measure of computational complexity - in general, as the
scale number goes up, the problem will take longer.
There are four puzzles that you need to work on as a pair:
ising
rank
decompose
integral
Each puzzle represents the computational core of some common calulation, though all have been simplified to just a few lines of code. It's not too difficult to work out what the underlying application is, and you may find it interesting to look at existing solutions in the literature. However, no specialised knowledge of each puzzles application domain is needed, and the expectation is that only the standard techniques from the lectures and earlier courseworks are needed.
The repository has a number of different directories:
-
The directories under
include
should not be modified.- They are owned by the framework, and will be over-written in assessment environments to ensure they have not been modified.
- If there are any changes to these files in the spec then they will be done in an additive way wherever possible (no changes are expected).
-
You own the files in the
provider
directory- You'll be replacing the implementation of
XXXXProvider::Execute
inprovider/xxxx.hpp
with something (hopefully) faster. - A good starting point is to replace the implementation of
XXXXProvider::Execute
with a copy of the body ofXXXXPuzzle::ReferenceExecute
, and check that it still does the same thing. - The reason for the indirection is to force people to have an unmodified reference version available at all times, as it tends to encourage testing.
- You can have multiple live alternative engines for each puzzle compiled into the executable, allowing you to progressively refine solutions while keeping previous known-good versions around. Each version should be registered under a different engine name.
- You'll be replacing the implementation of
-
The public entry point to your code is via
puzzler::PuzzleRegistrar::UserRegisterPuzzles
, which must be compiled into the static librarylib/libpuzzler.a
.- Clients will not directly include your code, they will only
#include "puzzler/puzzles.h
, then access puzzles via the registrar. They will get access to the registrar implementation by linking againstlib/libpuzzler.a
. - Note: If you do something complicated in your building of libpuzzler, it should still be possible to build it by going into
provider
and callingmake all
. - The current working directory during execution will be the root of the repository. So it will be executed as if by typing
bin/execute_puzzle
, and an opencl kernel should be loaded using the relative pathprovider/my_kernel.cl
. - Programs which link against and use
lib/libpuzzler.a
will also link in TBB and OpenCL. - You can register a given implementation with the registrar under a chosen engine name. This allows you to have multiple versions of a puzzle solution available, and allows you to expose specific solutions for different deliverable components.
- Clients will not directly include your code, they will only
-
The programs in
src
have no special meaning or status, they are just example programs
The reason for all this strange indirection is that I want to give maximum freedom for you to do strange things within your implementation (example definitions of "strange" include CMake) while still having a clean abstraction layer between your code and the client code.
The deliverable sections are:
-
TBB Competence (20%)
-
For all puzzles, you must deliver a version which correctly uses TBB to solve the problem. The majority of the compute must be performed under TBB, and it should use multiple cores. However, performance is not considered for this part, only correctness.
-
The engines demonstrating competence should be exposed via the name
${puzzle}.tbb
, where<puzzle>
is the puzzle name. For exampleising.tbb
orrank.tbb
.
-
-
OpenCL Competence (20%)
-
For two puzzles you must deliver a version which correctly uses OpenCL to solve the problem. The majority of the compute must be performed in kernels, and it should use multiple work-items in parallel. Again, performance is not considered here, only correctness.
-
The puzzles demonstrating OpenCL competence should be exposed via the name
${puzzle}.opencl
. -
If there are more than two engines listed of the form
${puzzle}.opencl
then two will be selected in a deterministic way.
-
-
Performance optimisation (40%)
-
Each puzzle has two target performance metrics (see below), defined for a competent and expert person, which were based on a person at the given level spending three hours per puzzle. For each puzzle you should provide an optimised version, and the improvement relative to the reference will be mapped to a mark:
- [reference,competent) : 0..40%
- [competent,expert*0.9) : 40..100%
Matching (or exceeding) expert*0.9 is equivalent to getting full marks, though you are not time constrained.
-
The puzzles to be assessed for performance should have a name of the form
${puzzle}.opt
. -
The target environment is a
g3.4xlarge
instance, running theHPCE-2018
AMI. -
There is no requirement on how the optimised engines are implemented. You could use OpenCL, TBB, a combination of OpenCL and TBB, or even straight sequential code.
-
The engines selected here might be aliases for engines from the competence sections, or they could be completely different.
-
-
Results presentation (20%)
-
Produce an A4 pdf called
question/results.pdf
which uses empirical data to explore and answer a question that arose while working with the puzzles. The pdf should contain no more than 300 words of text, and should use empirical data such as graphs and/or tables to support the conclusion.-
Quality of question
- Is it clear what question is being asked?
- Is the question non-trivial?
- Is the question specific to a puzzle (or puzzles)?
- Would a reader find the answer interesting?
-
Appropriateness of method
- Is the method aligned with the question: can it give an answer?
- Are metrics and independent/dependent variables clear?
- Is the tool or measurement method appropriate for the accuracy/precision needed?
-
Quality of data
- Is the data clearly presented?
- Are any important properties of the experimental environment described?
- Are there sufficient data points to allow conclusions to be drawn?
-
Analysis and insight
- Is the answer or result of the experiment clearly stated (or equally is the lack of a clear answer identified)?
- Does the conclusion draw upon the data presented?
- Does the conclusion show insight beyond simply reporting results?
-
-
The word limit is to avoid people spending too much time on writing long reports, and also to enforce concision and clarity of communication.
-
This section is deliberately open-ended, and is intended to assess your ability to formulate your own questions and design experiments, rather than just following instructions (this is a masters level course, after all).
-
It is worth thinking of this part as preparation for final year project evaluation sections, as your project evaluation should be addressing all these components too.
-
Performance is defined in terms of the number of puzzles of increasing scale
executed within a fixed time budget. The time budget is fixed at Puzzle::Execute
.
The sequence of scales is determined according to a schedule for each puzzle,
which maps an index
To create a continuous metric, the puzzle which is executing when the timer expires
will be added pro-rata based on "how far" it had got through. If we have a sequence
of scales
n+(t_B-t_{n})/(t_{n+1}-t_{n})
In order to manage execution time in assessment (e.g. in the case of an infinite loop),
if
Puzzle | Sequence | Reference | Competent | Expert |
---|---|---|---|---|
rank | 50+(i*50+U*20)^2 |
41.8 | 44.6 | 60.2 |
decompose | 20+(i*30+U*10)^1.2 |
15.2 | 26.8 | 38.8 |
ising | 20+(i*20+U*5) |
14.7 | 14.7 | 31.7 |
integral | 20+(i*20+U*5) |
15.8 | 59.0 | 172.6 |
Note that in the case of ising
our competent person was able to correctly
parallelise using TBB in a number of ways and also tried some OpenCL, but did
not manage to make it any faster despite using more cores. According to the
marks schema that means that if you manage to make ising faster, then you
start from a base level of 40%. Note that ising is not specifically a harder
problem, but it requires more thought and analysis to determine where to
start.
The expert observed that they weren't at the limits of performance on any puzzle, only of implementation time, so if you end up with higher performance don't assume you've done something wrong.
Drop into the root of the repo and do make all
. This should produce a number of executables:
-
bin/run_puzzle engine_name scale [logLevel]
: General purpose driver, which does the following steps:- Creates an instance of
engine_name
. - Creates an input for the chosen puzzle type using the given
scale
. - Executes the engine on the puzzle input and captures the output.
- Executes the reference implementation on the same puzzle input and captures the output.
- Verifies that the two solutions are the same.
- Creates an instance of
-
bin/create_input puzzle_name scale [logLevel]
: Creates an input for the given puzzle and scale, and then writes it (as binary!) to stdout. -
bin/execute_puzzle engine|"ref" [logLevel]
: Reads a problem from stdin, uses the chosen engine to solve it, then writes the solution to stdout. Selecting "ref" as the engine will use the reference solver. -
bin/compare_puzzle_output inputFile refFile gotFile [logLevel]
: Takes in an input problem, a reference solution, and an alternative solution, and checks that the two solutions are the same. The exit code of the program is 0 iff the two solutions match. Not that for some puzzles the answers do not have to be bit-exact equal, so two different outputs may both be correct.
The different programs are intended to make it easy to write infrastructure for testing performance and correctness. For example, if one wished to test an engine called "ising.tbb" at a number of different scales, one could script something along these lines:
#!/bin/bash
ENGINE="ising.tbb"
PUZZLE=${ENGINE%%.*}
SCALES="100 150 200"
WORKING=.tmp
mkdir ${WORKING}
for SCALE in $SCALES ; do
bin/create_puzzle_input ${PUZZLE} ${SCALE} > ${WORKING}/${PUZZLE}.${SCALE}.input
cat ${WORKING}/${PUZZLE}.${SCALE}.input | bin/execute_puzzle "ref" > ${WORKING}/${PUZZLE}.${SCALE}.ref
cat ${WORKING}/${PUZZLE}.${SCALE}.input | bin/execute_puzzle ${ENGINE} > ${WORKING}/${PUZZLE}.${SCALE}.${ENGINE}.got
bin/compare_puzzle_output ${WORKING}/${PUZZLE}.${SCALE}.input ${WORKING}/${PUZZLE}.${SCALE}.ref ${WORKING}/${PUZZLE}.${SCALE}.${ENGINE}.got
if [[ $? -ne 0 ]] ; then
>&2 echo "FAIL"
exit 1
fi
done
The code in github forms your submission, though you must submit your final hash via blackboard for time-keeping and non-repudiation purposes. Pushes to github after the deadline will not be treated as submissions, unless the new hash is also submitted after the deadline. All members of each group must submit the same hash to show agreement.