yalecyu/crnn.caffe

How to train the 36 Alpha by "0~9 + A~Z"?

samylee opened this issue · 8 comments

Hi @yalecyu ,
Thanks for sharing your code! I have trained the number of 09, and the test result is very effective.
But I can not train the 36 Alpha by "0
9 + AZ"!
The maximum length of my character is 19, of which there are 0
9 digits or A~Z characters in it.
How can I train 36 Alpha?
I modified the generate_train_id.py to meet the 36 words, The code I modified is as follows:

#########################################################################
#!/usr/bin/env python

coding=utf-8

import pdb
import os
import numpy as np
from multiprocessing import Process
import sys
sys.path.insert(0,'python')
import caffe
import h5py

CAFFE_ROOT = os.getcwd() # assume you are in $CAFFE_ROOT$ dir
img_path = os.path.join(CAFFE_ROOT, 'data/AlphaNum/train/')
IMAGE_WIDTH, IMAGE_HEIGHT = 128, 32
LABEL_SEQ_LEN = 19

captcha images list

images = filter(lambda x: os.path.splitext(x)[1] == '.jpg', os.listdir(img_path))

print '[+] total image number: {}'.format(len(images))

np.random.shuffle(images)

def write_image_info_into_hdf5(file_name, images, phase):
total_size = len(images)
print '[+] total image for {0} is {1}'.format(file_name, len(images))

single_size = 500
groups = total_size / single_size
if total_size % single_size:
    groups += 1
def process(file_name, images):

    #####################Alpha support################
    Alpha = ['0','1','2','3','4','5','6','7','8','9',\
             'A','B','C','D','E','F','G','H','I','J',\
             'K','L','M','N','O','P','Q','R','S','T',\
             'U','V','W','X','Y','Z']
    ##################################################

    img_data = np.zeros((len(images), 3, IMAGE_HEIGHT, IMAGE_WIDTH), dtype = np.float32)
    label_seq = 10*np.ones((len(images), LABEL_SEQ_LEN), dtype = np.float32)
    for i, image in enumerate(images):
        img_name = os.path.splitext(image)[0]
        newNumStr = img_name[0:]
        
        ############Alpha support##############
        numbers_str = range(len(newNumStr))
        for i in range(0, len(newNumStr)):
            numbers_str[i] = Alpha.index(newNumStr[i])
        #######################################

        numbers = np.array(map(lambda x: float(x), numbers_str))
        label_seq[i, :len(numbers)] = numbers
        img = caffe.io.load_image(os.path.join(img_path, image))
        img = caffe.io.resize(img, (IMAGE_HEIGHT, IMAGE_WIDTH, 3))
        img = np.transpose(img, (2, 0, 1))
        img_data[i] = img
        """
        if (i+1) % 100 == 0:
            print '[+] name: {}'.format(image)
            print '[+] number: {}'.format(','.join(map(lambda x: str(x), numbers)))
            print '[+] label: {}'.format(','.join(map(lambda x: str(x), label_seq[i])))
        """
    with h5py.File(file_name, 'w') as f:
        f.create_dataset('data', data = img_data)
        f.create_dataset('label', data = label_seq)
with open(file_name, 'w') as f:
    workspace = os.path.split(file_name)[0]
    process_pool = []
    for g in xrange(groups):
        h5_file_name = os.path.join(workspace, '%s_%d.h5' %(phase, g))
        f.write(h5_file_name + '\n')
        start_idx = g*single_size
        end_idx = start_idx + single_size
        if g == groups - 1:
            end_idx = len(images)
        p = Process(target = process, args = (h5_file_name, images[start_idx:end_idx]))
        p.start()
        process_pool.append(p)
    for p in process_pool:
        p.join()

trainning_size = 2789 # number of images for trainning
trainning_images = images[:trainning_size]

write_image_info_into_hdf5(os.path.join('data/AlphaNum/train_datasets/', 'trainning.list'), trainning_images, 'train')
##############################################################################

But I don't know if I've modified it correctly. Could you give me some suggestions?
And what form should "crnn.prototxt" be changed into?

Any of your suggestions will be geate helpful to me!
I am looking forward to your reply!
Thank you!

Sincerely,
Samylee

I too am looking to train on my own dataset, of 109 ascii characters, and am trying to implement the CRNN for strings (like the original paper). My label for each image is the ascii equivalent of each of the characters in the original string label. So in my hdf5 dataset, each image will have an array of ints as the label.
My question now is where are all the spots in the code that I change the number of alphabets from 11 (as per your implementation) to 128 as per mine?

Also, if I start training on it after changing alphabet_number in crnn.prototxt to 109 (I've mapped my ascii characters from 1-109 -> including symbols, like in the original CRNN implementation), I get this error:

.
.
.
.
I0913 17:55:12.587182 14556 solver.cpp:273] Solving crnn
I0913 17:55:12.587183 14556 solver.cpp:274] Learning Rate Policy: step
I0913 17:55:12.589298 14556 solver.cpp:331] Iteration 0, Testing net (#0)
F0913 17:55:12.701380 14556 ctc_loss_layer.cu:36] Check failed: status == CTC_STATUS_SUCCESS (1 vs. 0) cuda memcpy or memset failed
*** Check failure stack trace: ***
    @     0x7f1b402e4daa  (unknown)
    @     0x7f1b402e4ce4  (unknown)
    @     0x7f1b402e46e6  (unknown)
    @     0x7f1b402e7687  (unknown)
    @     0x7f1b40a71de1  caffe::CtcLossLayer<>::Forward_gpu()
    @     0x7f1b40a197b3  caffe::Net<>::ForwardFromTo()
    @     0x7f1b40a19b77  caffe::Net<>::Forward()
    @     0x7f1b40a320b2  caffe::Solver<>::Test()
    @     0x7f1b40a3283e  caffe::Solver<>::TestAll()
    @     0x7f1b40a34a39  caffe::Solver<>::Step()
    @     0x7f1b40a34c5a  caffe::Solver<>::Solve()
    @           0x408085  train()
    @           0x4059ac  main
    @     0x7f1b3f2def45  (unknown)
    @           0x40620b  (unknown)
    @              (nil)  (unknown)

Is this really a cuda error (since it says memcopy/memset failed) or is this because of the number of alphabets in my labels that is causing the CTC to fail?
@yalecyu , please let me know, thanks!

Can you send your prototxt to my email?

Thanks for responding, @yalecyu ! But I found out where I still needed to change the number of outputs: in one of the inner product layers in the crnn.prototxt, I changed the num_outputs parameter to the size of my new alphabet set, and now I am able to train successfully! :)

@samylee , have you been able to figure it out? I can tell you what I did to get my training on my dataset.

You'll need to change the alphabet_size parameter in the crnn.prototxt to fit the size of your dictionary of characters + 1 (for the blank label). And change your blank_label parameter to alphabet_size-1 (caffe classifier needs each class - each character in this case) to be indexed form 0 to n-1, for n classes.

In your generate_dataset.py file, you'll need to make sure your label_seq array is np.ones multiplied with the blank_label number that you set in your prototxt. Also, since you'll be storing the dataset in hdf5 format, you will only be able to store numbers (caffe will throw an error during training for any data type not float or double for labels), which means you want to store the ascii values (or other numerical conversion) of your letters, not as strings. And then an additional step would be to map these characters from [0, n) to fit caffe's classification requirements.

@stalagmite7 hi,if I want to train the Chinese character recognition, how can I prepare for the train dataset ?

You'll need to have word level labels, and change the generate dataset file to reflect the number of characters in your alphabet-dictionary and map the alphabets to [0, num_alpha) like I mentioned in the previous comment. During prediction, you'll have to inverse-map the integer vector predicted back to the chinese alphabet you've used.

@stalagmite7 Hi,bro.
There are some troubles about my own data.Because there are some alphabets in my data like 'a'、'b'、'c',I can not just make it into np.array.And I don't know how to change the "generate_dataset.py".Can you give me a template.
Looking forward to your reply