generic-c-hashmap should be the easiest to use hash map for C possible.
It only knows how to find, put, remove, and iterate over entries.
You don't have make any adjustments at all to the elements to store. If you can find a hash function and an equality comparator for your data type, you can store it in a generic-c-hashmap.
See examples folder.
Especially stringCount.c is an example for a blazing fast (and I mean it!) word counter.
generic-c-hashmap consists of two macros to set up your hashmap type:
DEFINE_HASHMAP(NAME, TYPE)
DECLARE_HASHMAP(NAME, CMP, GET_HASH, FREE, REALLOC)
DEFINE_HASHMAP(NAME, TYPE) defines the function prototypes and the type
NAME:
typedef struct {
size_t size;
} NAME;
void NAMENew(NAME *map);
void NAMEDestroy(NAME *map);
bool NAMEEnsureSize(NAME *map, size_t capacity);
bool NAMEFind(const NAME *map, TYPE **entry);
HashMapPutResult NAMEPut(NAME *map, TYPE **entry, HashMapDuplicateResolution dr);
bool NAMERemove(NAME *map, TYPE *entry);
You should use DEFINE_HASHMAP in a header file (.h). You may put multiple
hashmap definitions in one header file.
DECLARE_HASHMAP(NAME, CMP, GET_HASH, FREE, REALLOC) declares the actual
function and should be put in a code file (.c):
NAMEis the same name as in DEFINE_HASHMAPCMPis your comparator function/macroGET_HASHis your comparator function/macroFREEis the free function to use, e.g.freeorg_freeREALLOCis the realloc function to use, e.g.reallocorg_realloc
SOME_INT_TYPE hash(TYPE *entry);
Function to calculate/retrieve entry's hash. E.g.
static uint64_t djb2(char **entry) {
unsigned long hash = 5381;
for(char *c = *entry; *c; ++c) {
hash = ((hash << 5) + hash) + *c;
}
return hash;
}
for TYPE = char* (C strings) or identity if you store integers in your
hashmap:
#define INT_HASH(entry) *entry
SOME_INT_TYPE compare(TYPE *left, TYPE *right)
Must return 0 if left equals right, and some other value otherwise. E.g.
#define STRING_CMP(left, right) strcmp(*left, *right)
#define INT_CMP(left, right) *left==*right ? 0 : 1
void NAMENew(NAME *map);
sets up a new empty hashmap.
void NAMEDestroy(NAME *map);
removes every element from the map and sets its capacity to zero.
bool NAMEEnsureSize(NAME *map, size_t capacity);
ensures that map can hold at least capacity elements without need to grow.
So if you want to add n elements at batch, you may call
NAMEEnsureSize(&map, map.size + n) before.
Returns false if your memory is exhausted.
bool NAMEFind(const NAME *map, TYPE **entry);
Find element *entry an returns it pointer in *entry.
Returns false if *entry was not found. In latter case *entry will contain
a random value.
You may change value of the retrieved *entrybut you must not change its hash!
TYPE *iter;
HASHMAP_FOR_EACH(NAME, iter, map) {
do_something_with(iter);
} HASHMAP_FOR_EACH_END
Iterates over all elements stored in the map. iter is the iterator.
You may use break and continue and in usual loops.
While iterating you must not add to or remove from the map!
If your map stores pointers, you may want to iterate over all elements and the
free(*iter) before destroying the hashmap to avoid memory leaks.
HashMapPutResult NAMEPut(NAME *map, TYPE **entry, HashMapDuplicateResolution dr);
Puts a new value *entry in the map. What to do if value already existed
depends on dr:
HMDR_FAIL: NAMEPut() will returnHMPR_FAILED=0,*entrycontains a pointer to the found element.HMDR_FIND: NAMEPut() will returnHMPR_FOUND,*entrycontains a pointer to the found element.HMDR_REPLACE: NAMEPut() will returnHMPR_REPLACED, will overwrite the stored element,*entrycontains a pointer to the newly stored element in the map.HMDR_SWAP: NAMEPut() will returnHMPR_SWAPPED, and exchange**entryand the old element.HMDR_STACK: NAMEPut() will returnHMPR_STACKED, put*entrya second time, and return a pointer to the old element in*entry. NAMEFind() will only find the firstly put element!
If your memory is exhausted, NAMEPut() will return HMDR_FAIL. If *entry did
not exist in the map and was put in to it, HMPR_PUT will be returned.
bool NAMERemove(NAME *map, TYPE *entry);
Removes *entry form the map. Returns false if it did not exist.
The maps capacity will never shrink.
generic-c-hashmap uses C99 syntax. Use --std=c99 or --std=gnu99 with gcc.
I was to lazy to come up with a name … Maybe I will click “Random article” in Wikipedia until I find a cool name. :-)
Seems to perform in same league as widely used uthash. See speed tests.
Execution time (ms):
-O0 -O1 -O2 -O3 -Os gen.-c-hashmap 258 211 212 202 218 uthash 278 214 219 216 201
Compiling time (ms):
-O0 -O1 -O2 -O3 -Os gen.-c-hashmap 40 63 84 108 69 uthash 941 953 971 978 966
- René Kijewski: original author
- Stefan Friesel: bug spotted
- Pete Diemert: bugs spotted (1, 2)
- noderblade: bug spotted (3)
MIT license, see LICENSE. In short:
- you may use generic-c-hashmap in your commercial applications,
- you don't have give to credit to generic-c-hashmap, if you use it in a binary distribution, but you may if you want to,
- you have to put the license file in your source code distribution and must attribute the contributers to generic-c-hashmap in the respective source code files.