Project-K23a

Team Members

Katsouli Ioanna (1115201400067)
Konomi Marina (1115201700054)
Leonidas-Panagiotis Papadakos (1115201700117)

Hashtable is used, as it helps to form smaller lists and makes the search procedure faster.
Faster, because it has average time complexity O(1) for search, insertion and deletion
(and worst time complexity O(n)).
Single-linked lists has O(1) for insertion and deletion and O(n) for search procedure.

Build

Run make, to build the executables and unit tests
To run the testing executables, (found in tests/), run make check.

Usage

./specs [-e buckets (default: 10007)] -x dataset_x -w dataset_w [-o FILE (default: stdout)]

Notes/Assumptions

The unit testing library used is acutest.
It is assumed that within the directory of Dataset X, there will be only site folders (e.g. www.ebay.com) and .json files.
The way we print the cliques as distinct pairs of specs is outlined in print_pairs(), within hashtable.c. There was a discussion on this in this Piazza question and verbally, in the weekly Zoom meetings.
The program outputs to stdout by default, which is also suggested in the aforementioned Piazza discussion. Use -o somefile.csv or redirection, to point the data elsewhere.
Stopword list: english.txt from Alir3z4/stop-words.
Using the log10() function, because the numeric results in the exercise explanation point to it instead of log9) or log2().
It is assumed that the sets (training, testing e.t.c.) are at least as big at the BATCH_SIZE.

Validation