ASL Translator
Dataset
Original Data on Kaggle
In this project, we create our own dataset, the Dataset-A, by arranging the following data from Kaggle.com:
Dataset-A
We want to challenge if our model is general and robust or not, so we build this hybrid dataset. Links for download:
Structure of Dataset-A
In both train data and test data, they contain alphabet A to Z, 26 classes folder of right hand image instances data.
Training Data
- All data are the subset of dataset1
- We get rid of some images that cannot pass our image-pipeline in dataset1
- The image count for each alphabet is approximately to the amount 2220.
Here's the chart of our image count distribution in Dataset-A training set:
Testing Data
- For each alphabet, we select 555 image instances (2220 * 0.2 = 555) from dataset2, dataset3 and dataset4
Here's the chart of our image count distribution in Dataset-A testing set:
Methodology
a. Data Preprocessing
The demo images are in the folder pipeline-demo
, the image file name prefix indicates the pipeline fucntion type
image-pipeline, with two different type
I. General Pipeline:
This kind of pipeline can be used in any kinds of preprocessing stage.
// work flow
1. roi normalization (by mediapipe)
2. background normalization (by rembg)
3. skin normalization
4. channel normalization
5. resolution normaliztion
II. Training Pipeline:
This kind of pipeline can only be used in training preprocessing stage.
// work flow
1. background normalization (by rembg)
2. roi normalization (by mediapipe)
3. skin normalization
4. channel normalization
5. resolution normaliztion
Data Augmentation
- Implement by
keras.ImageDataGenerator
withzoom_range=0.1,
,width_shift_range=0.1
,height_shift_range=0.1
,shear_range=0.1
- Implement by
keras.model.Engine
, we create our own Spatial Transformer Layerstn()
.
b. Model Building
You can download our models at here: saved_models_v1
Normal Model - Pure CNN Structure without Spatial Transform Layers:
The implement code is in asl_model/models.py
-get_model_1()
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), strides=(1, 1), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), strides=(1, 1), activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), strides=(1, 1), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D((2, 2)))
# finish feature extraction
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.25))
model.add(layers.Dense(26, activation='softmax'))
STL Model - Spatial Transform Layer with CNN Structure
The implement code is in asl_model/stl/struct_a/model.py
-get_stn_a_model_8()
input_layers = layers.Input((size, size, 1))
x = stn(input_layers)
x = layers.Conv2D(32, (3, 3), strides=(1, 1), activation='relu',
kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3, 3), strides=(1, 1), activation='relu',
kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(128, (3, 3), strides=(1, 1), activation='relu',
kernel_initializer=initializers.glorot_uniform(seed=STN_SEED))(x)
x = layers.BatchNormalization()(x)
x = layers.Flatten()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.25)(x)
output_layers = layers.Dense(26, activation="softmax")(x)
model = tf.keras.Model(input_layers, output_layers)
c. Model Training
Basic
- 57717 train images, 20% will become the validation data
- 14430 test image, 555 test images for each alphabet
First, data-structure-selection
Select the best structure for normal-model and stl-model. With following hyper-parameters:
lr = 0.001
epoch = 10
batch_size = 128
optimizer=tf.keras.optimizers.Adam(learning_rate = lr),
loss='categorical_crossentropy',
metrics=['accuracy']
Second, use callback function to train the best model of each type. With following settings:
BATCH = 128
EPOCH = 100 # max epoch
# call back functions
es_callback = tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau()
loss="categorical_crossentropy"
optimizer="adam"
metrics=["accuracy"]
d. Model Evaluation
Normal Model
Validation Data - Epoch Accuracy
Validation Data - Epoch Loss
Testing Data - Total Accuracy : 89.4%
Testing Data - F1-Score Report:
STL Model
Validation Data - Epoch Accuracy
Validation Data - Epoch Loss
Testing Data - Total Accuracy : 90.6%
Testing Data - F1-Score Report:
There are more evaluation charts in the folder charts
Getting Starting
Environment
conda create --name aslt python=3.8 -y
(Options) - If you want to use jupyter run these commands
// actactivate venv
pip install ipykernel
python -m ipykernel install --user --name aslt --display-name "ASLT"
rembg
install Pytorch for package- get pyTorch install instructions on pytorch.org For example:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch