You find here all the necessary materials for the labs of the High Performance Programming Course.
For each session of the course, a notion will be introduced (Data Structure, Algorithms, Archictecture) and will be applied in the following lab.
The general framework of the lab is a maven project that process data from the DEBS 2015 Grand Challenge. This challenge contains data from taxi trips in NYC.
You will be asked to answer queries on the data. Each query will reflect the notions seen during the course. The goal being to answer these queries as fast as possible.
First of all, fork this project into your own account: click on the Fork icon on this page. Clone the forked project on your computer. Import the project in Eclipse via Import->Maven Project.
Two main classes are at your disposition, the first one , MainNoNStreaming
first loads all data in memory then sends the data to each query processor. The second one, MainStreaming
streams the data to the query processors.
The repository contains a small data file with 1000 records. This file is sufficient for test purpose but is too limited for large scale processing. You need to download the 2 millions records file from here (130Mb). Unzip it in src/main/resources/data
.
To create a new query processor, create a new class in the package fr.tse.fi2.hpp.labs.queries.impl
. Your class must extend AbstractQueryProcessor
.
An exemple of an empty class:
public class SampleQueryProcessor extends AbstractQueryProcessor{
public SampleQueryProcessor(QueryProcessorMeasure measure) {
super(measure);
}
@Override
protected void process(DebsRecord record) {
// Process the record
}
}
You must complete the process
method to implement the queries. This method is called for each DebsRecord that is sent by the framework. A DebsRecord
contains information for one taxi trip: coordinates for pickup and dropoff, price paid, tip, ... The full list is available in the file as well as here (Data Section).
To be executed, your query processor must be registered in one (or both) main classes. Edit the files to add your own query processor:
List<AbstractQueryProcessor> processors = new ArrayList<>();
// Add you query processor here
processors.add(new SimpleQuerySumEvent(measure));
To add a result to the output file simply use the writeLine(String line)
method. It will automatically append a line in the results/queryN.txt
file, where N
is the identifier of your query processor (automatically generated).
The framework includes a basic measurement system. Global execution time, per query execution time and throughput are automatically written in results/result.txt
.
For some labs, specific instructions will be given to produce measure with JMH.
Follow the installation instruction. Verify that everything is ok with a mvn install
. Install the extra data in your project. Modify the main classes to parse the sorted_data.csv
file.
Remove the existing query that counts the events.
To compare performance for two implementations of the same feature, create the following queries:
StupidAveragePrice
that puts every new trip price into a list and compute the average based on every number in the listIncrementalAveragePrice
that uses the previous results to incrementally compute the average.
Execute both queries and measure the difference of running time and throughput, for both streaming and non streaming case.
TBD
TBD
Evaluation will be made based on the code available on your forked version of this project. No additional material will be accepted.