/NUMA_Black-Box

Black-box Concurrent Data Structures for NUMA Architectures

Primary LanguageCMIT LicenseMIT

Black-box Concurrent Data Structures for NUMA Architectures

Based on: http://cs.brown.edu/~irina/papers/asplos2017-final.pdf
By: Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera

             Node #1                              Node #2             
╔═══════╦═══════╦═══════╦═══════╗    ╔═══════╦═══════╦═══════╦═══════╗
║ Core  ║ Core  ║ Core  ║ Core  ┣━━━━┫ Core  ║ Core  ║ Core  ║ Core  ║
╟───────╫───────╫───────╫───────╊━━━━╉───────╫───────╫───────╫───────╢
║ Cache ║ Cache ║ Cache ║ Cache ┣━━━━┫ Cache ║ Cache ║ Cache ║ Cache ║
╟───────╨───────╨───────╨───────╢    ╟───────╨───────╨───────╨───────╢
║             Cache             ║    ║             Cache             ║
╚═════════════┳━┳━┳═════════════╝    ╚═════════════┳━┳━┳═════════════╝
              ┃ ┃ ┃                                ┃ ┃ ┃              
┌─────────────┺━┻━┹─────────────┐    ┌─────────────┺━┻━┹─────────────┐
│            Memory             │    │            Memory             │
└───────────────────────────────┘    └───────────────────────────────┘

/************************************************************/
/************	My implementation	*********************/
/************************************************************/
This is the general scheme of my implementation:
                                              ┌────────────────────────┐
                                              │      libgenericDS      │
                                              ├────────────────────────┤
                                              │ genericDataStructure.h │
                                              │ genericDataStructure.c │
                                              └────────────────────────┘
                                                          ▲             
                                                          ║             
                               Shared Memory              ▼             
┌──────────────────────┐    ┌──────────────────┐    ┌────────────┐      
│ libgenerateCommands  │═══▶│ gds_commands.bin │═══▶│ libNUMAbb  │      
├──────────────────────┤    └──────────────────┘    ├────────────┤      
│  generateCommands.c  │                            │  NUMAbb.h  │      
│                      │                            │  NUMAbb.c  │      
└──────────────────────┘                            └────────────┘      
            ▲                                             ▲             
            ║                  ┌────────────┐             ║             
            ╚══════════════════│ myprogram  │═════════════╝             
                               ├────────────┤                           
                               │   main.c   │                           
                               └────────────┘                           
The code was designed to work on unix system and was written in C.
In order to run the program there is only one file that need to be run,
But the files are still acting like a separate libraries.
There is more work that need to be done, Feel free to fork my project.

Run the program:
	Build and run the program by running the command: "sh makefile.sh"

Limitation from the current implementation:
	1) The mmap size is a limitaion on the amount of commands.
	2) Variable MAX_CORES limit the program to work with up-to 512 cores.
	3) Maximum 'op' and 'args' size is 1024 chars each.

Problems:
	Unfortunately, My code isn't perfect yet, 

More upgrades that can be done:
	Section 5.2
		To avoid small inefficient batches, the combiner in NR waits
		if the batch size is smaller than a parameter MIN BATCH. Rather
		than idle waiting, the combiner refreshes the local replica from
		the log, though it might need to refresh again after finally
		adding the batch to the log
	Section 5.5 - Better readers-writer lock
		The distributed readers-writer lock uses a per-reader
		lock to reduce reader overhead; the writer must acquire the
		locks from all readers. We modify this algorithm to reduce
		writer overhead as well, by adding an additional writer lock.
		To enter the critical section, the writer must acquire the
		writer lock and wait for all the readers locks to be released,
		without acquiring them; to exit, it releases its lock. A reader
		waits if the writer lock is taken, then acquires its local lock,
		and checks the writer lock again; if this lock is taken, the
		reader releases its local lock and restarts; otherwise, it enters
		the critical section; to exit, it releases the local lock. With
		this scheme, the writer and readers incur just one atomic
		write each on distinct cache lines to enter the critical sec-
		tion. Readers may starve if writers keep coming, but this is
		unlikely with NR , as often only one thread wishes to be a
		writer at a time (the combiner) and that thread has signif-
		icant work outside the critical section.
	Section 5.7
		Padded the data and aligned the cache to avoid false sharing.
	General
		What prevent two combiners (on separate nodes) to write the log at the same time?
		I think there is need to create something that will prevent it.

Build errors:
	In case of compilation errors, Make sure you have all the dependency packages on your computer,
	Use "apt-get install" / "yum install" / "dnf install" To install: (On Ubunto devel => dev)
		1) libnuma-devel	Development files for libnuma
		2) numactl		Library for tuning for Non Uniform Memory Access machines
		3) numactl-libs		libnuma libraries
		4) numactl-devel	Development package for building Applications that use numa
		5) glibc-devel		Object files for development using standard C libraries.
		6) glibc-static		C library static libraries for -static linking.
		7) libstdc++-devel	Header files and libraries for C++ development
		8) libstdc++-static	Static libraries for the GNU standard C++ library