jackaduma/SpeakerRecognition-ResNet-GhostVLAD

Utterance-level Aggregation For Speaker Recognition In The Wild, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end

PythonMIT

SpeakerRecognition-ResNet-GhostVLAD