mustansarfiaz
He is working as a staff Research Scientist at IBM Research, Abu Dhabi, UAE.
IBM ResearchAbu Dhabi
Pinned Repositories
Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
DDAM-PS
DDAM-PS: Diligent Domain Adaptive Mixer for Person Search -- WACV2024
ga2net
IRCA-Siam
IRCA-Siam: Improving Object Tracking by Added Noise and Channel Attention
Large-Selective-Kernel-Network
PS-ARM
Abstract. Person search is a challenging problem with various real- world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, previous study focuses on rich feature information learning, it’s still hard to re- trieve the query person due to the occurrence of appearance deformations and background distractors. In this paper, we propose a novel attention- aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within RoI of a per- son and make it robust against various appearance deformations and occlusion. The proposed ARM is composed of a relation mixer block and a spatio-channel attention layer. The relation mixer block introduces a spatially attended spatial mixing and a channel-wise attended channel mixing for effectively capturing discriminative relation features within an RoI. These discriminative relation features are further enriched by intro- ducing a spatio-channel attention where the foreground and background discriminability is empowered in a joint spatio-channel space. Our ARM module is generic and it does not rely on fine-grained supervisions or topological assumptions, hence being easily integrated into any Faster R-CNN based person search methods. Comprehensive experiments are performed on two challenging benchmark datasets: CUHK-SYSU [1] and PRW [2]. Our PS-ARM achieves state-of-the-art performance on both datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain of 5% in the mAP score over SeqNet, while operating at a comparable speed
SA2-Net
SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation (BMVC'23 -- Oral)
SAT
SAT: Scale-Augmented Transformer for Person Search
ScratchFormer
ScratchFormer: Remote Sensing Change Detection With Transformers Trained from Scratch
SCS-Siam
SCS-Siam: Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking
mustansarfiaz's Repositories
mustansarfiaz/ScratchFormer
ScratchFormer: Remote Sensing Change Detection With Transformers Trained from Scratch
mustansarfiaz/SA2-Net
SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation (BMVC'23 -- Oral)
mustansarfiaz/PS-ARM
Abstract. Person search is a challenging problem with various real- world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, previous study focuses on rich feature information learning, it’s still hard to re- trieve the query person due to the occurrence of appearance deformations and background distractors. In this paper, we propose a novel attention- aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within RoI of a per- son and make it robust against various appearance deformations and occlusion. The proposed ARM is composed of a relation mixer block and a spatio-channel attention layer. The relation mixer block introduces a spatially attended spatial mixing and a channel-wise attended channel mixing for effectively capturing discriminative relation features within an RoI. These discriminative relation features are further enriched by intro- ducing a spatio-channel attention where the foreground and background discriminability is empowered in a joint spatio-channel space. Our ARM module is generic and it does not rely on fine-grained supervisions or topological assumptions, hence being easily integrated into any Faster R-CNN based person search methods. Comprehensive experiments are performed on two challenging benchmark datasets: CUHK-SYSU [1] and PRW [2]. Our PS-ARM achieves state-of-the-art performance on both datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain of 5% in the mAP score over SeqNet, while operating at a comparable speed
mustansarfiaz/DDAM-PS
DDAM-PS: Diligent Domain Adaptive Mixer for Person Search -- WACV2024
mustansarfiaz/ga2net
mustansarfiaz/SAT
SAT: Scale-Augmented Transformer for Person Search
mustansarfiaz/SCS-Siam
SCS-Siam: Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking
mustansarfiaz/Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
mustansarfiaz/IRCA-Siam
IRCA-Siam: Improving Object Tracking by Added Noise and Channel Attention
mustansarfiaz/Large-Selective-Kernel-Network
mustansarfiaz/SiamTrackers
(2020)The PyTorch version of Siamese ,SiamFC,SiamRPN,DaSiamRPN,UpdateNet,SiamDW,SiamRPN++, SiamMask,and SiamFC++ ; Visual object tracking based on deep learning
mustansarfiaz/AFS-Siam
AFS-Siam: Adaptive Feature Selection Siamese Networks for Visual Tracking
mustansarfiaz/benchmark_results
Visual Tracking Paper List
mustansarfiaz/COAT
Official Code for CVPR 2022 paper Cascade Transformers for End-to-End Person Search
mustansarfiaz/Directional-Deep-Embedding-and-Appearance-Learning-for-Fast-Video-Object-Segmentation
We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. DDEAL achieves a J & F mean score of 74.8% on DAVIS 2017 dataset and an overall score G of 71.3% on the large-scale YouTube-VOS dataset, while retaining a speed of 25 fps with a single NVIDIA TITAN Xp GPU. Furthermore, our faster version runs 31 fps with only a little accuracy loss.
mustansarfiaz/ffcv
FFCV: Fast Forward Computer Vision (and other ML workloads!)
mustansarfiaz/changebind
mustansarfiaz/Computer-Vision-Video-Lectures
A curated list of free, high-quality, university-level courses with video lectures related to the field of Computer Vision.
mustansarfiaz/elgcnet
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection
mustansarfiaz/fanet
FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background (ICIP 2024)
mustansarfiaz/hover_net
Simultaneous Nuclear Instance Segmentation and Classification in H&E Histology Images.
mustansarfiaz/HyRect-Change
HYRET-CHANGE: A HYBRID RETENTIVE NETWORK FOR REMOTE SENSING CHANGE DETECTION
mustansarfiaz/MyApps
my test app
mustansarfiaz/OTTC
Object Tracking and Temple Color Benchmark
mustansarfiaz/SeqNet
[AAAI 2021] Sequential End-to-end Network for Efficient Person Search
mustansarfiaz/ThirdParty
Modifications to third party software used by UE4