Welcome to the Pill Recognition Research Repository! This repository hosts code and resources related to three conference and a journal paper focusing on advancing pill recognition techniques. Each paper corresponds to a separate branch in this repository, and they are briefly introduced below:
As of 2024.11.29.:
Branch Name | Status | Description |
---|---|---|
idaacs2023 |
Up-to-date | Multi-Stream Pill Recognition with Attention |
kdir2023 |
Up-to-data | Pill Metrics Learning with Multihead Attention |
visapp2024 |
Up-to-date | Word and Image Embeddings in Pill Recognition |
wiley |
Under revision | Metric-based pill recognition with the help of textual and visual cues |
We tackle the pill recognition challenge through a groundbreaking approach that employs a multi-stream network with EfficientNet-B0 and a self-attention mechanism. To eliminate the explicit training of printed or embossed patterns, Local Binary Pattern (LBP) features are utilized. Evaluation is performed on two datasets, demonstrating that our proposed model surpasses previous models in Top-1 and Top-5 accuracy. Notably, the model also outperforms the YOLOv7 network in a reference-quality use-case.
Highlights:
- Multi-stream network with EfficientNet-B0
- Self-attention for improved feature capture
- Outperforms YOLOv7 in specific use cases
Branch: idaacs2023
In the realm of object recognition, especially where new classes can emerge dynamically, few-shot learning holds significant importance. Our article focuses on metrics learning, a fundamental technique for few-shot object recognition, successfully applied to pill recognition. We employ multi-stream metrics learning networks and explore the integration of multihead attention layers at various points in the network. The model's performance is evaluated on two datasets, showcasing superior results compared to a state-of-the-art multi-stream pill recognition network.
Highlights:
- Few-shot learning with metric learning
- Multihead attention at various network stages
- Superior accuracy compared to previous multi-stream approaches
Branch: kdir2023
Addressing the crucial task of improving pill recognition accuracy within a metrics learning framework, our study introduces a multi-stream visual feature extraction and processing architecture. Leveraging multi-head attention layers, we estimate pill similarity. An innovative enhancement to the triplet loss function incorporates word embeddings, injecting textual pill similarity into the visual model. This refinement operates on a finer scale than conventional triplet loss models, resulting in enhanced visual model accuracy. Experiments and evaluations are conducted on a new, freely available pill dataset.
Highlights:
- Multi-stream architecture with visual and text embeddings
- Enhanced triplet loss for better visual model performance
- Freely accessible pill dataset for further experimentation
Branch: visapp2024
Pill image recognition by machine vision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable image processing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. We attack the problem with a multi-stream, two-phase metric embedding neural model. To enhance the metric learning procedure, we introduce dynamic margin setting into the loss function. Moreover, we show that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. We analyse the performance on two datasets and report a 1.6% (two-sided) and 2.89% (one-sided) increase in Top-1 accuracy on the CURE dataset compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification; moreover, the approach applies to other areas of object recognition where few-shot problems arise. The proposed high-level feature injection method (into a low-level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.
Highlights:
- Multi-stream neural model with dynamic margin setting
- Combines visual cues with NLP-processed textual data
Branch: journal
Each branch contains detailed instructions for reproducing experiments, including pre-processing steps, model training, and evaluation scripts. Please refer to individual branch documentation for usage examples and dataset preparation.