serkansulun's Stars
google-research/google-research
Google Research
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
cmhungsteve/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
karan/Projects-Solutions
:pager: Links to others' solutions to Projects (https://github.com/karan/Projects/)
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
CSAILVision/places365
The Places365-CNNs for Scene Classification
rmokady/CLIP_prefix_caption
Simple image captioning model
piergiaj/pytorch-i3d
jeffreyyihuang/two-stream-action-recognition
Using two stream architecture to implement a classic action recognition method on UCF101 dataset
AndreyGuzhov/AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
hassony2/kinetics_i3d_pytorch
Inflated i3d network with inception backbone, weights transfered from tensorflow
minzwon/sota-music-tagging-models
YuanGongND/ssast
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
tbmoon/facenet
FaceNet for face recognition using pytorch
juansgomez87/datasets_emotion
This repository collects information about different data sets for Music Emotion Recognition.
lucidrains/bidirectional-cross-attention
A simple cross attention that updates both the source and target in one step
sergiooramas/tartarus
Deep Learning for audio and text
Dsqvival/hierarchical-structure-analysis
Algorithm and Data for paper "Automatic Detection of Hierarchical Structure and Influence of Structure on Melody, Harmony and Rhythm in Popular Music"
Xeaver/EmotionCLIP
[CVPR 2023] Code for "Learning Emotion Representations from Verbal and Nonverbal Communication"
nku-zhichengzhang/CTEN
[CVPR 2023] This is the official implementation of "Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network"
Irurnnen/Songsterr-saver
xiaobai1217/DomainAdaptation
CVPR2022
ekazakos/MTCN
Implementation of "With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021" in PyTorch
m-bain/CondensedMovies-chall
Condensed Movies Challenge 2021
jalexander1/Python_Course_Slideware
Intro to Python Course
dkrst/Multi_Label_Confusion_Matrix