Attemping to train on own corpus

Question

Attemping to train on own corpus

garrett-yoon opened this issue 4 years ago · 4 comments

Hi, I'm unsure why the loss is trending towards infinity training on my small corpus? the vector.txt is filled with 'nan'. I adjusted the 'eta'/learning rate and still having problems.

gcc src/glove.c -o build/glove -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/shuffle.c -o build/shuffle -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/cooccur.c -o build/cooccur -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
gcc src/vocab_count.c -o build/vocab_count -lm -pthread -Ofast -march=native -funroll-loops -Wno-unused-result
BUILDING VOCABULARY
Processed 146823 tokens.
Counted 2957 unique words.
Truncating vocabulary at min count 5.
Using vocabulary of size 1284.

COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 1284 words.
Building lookup table...table contains 1648657 elements.
Processed 146823 tokens.
Writing cooccurrences to disk......2 files in total.
Merging cooccurrence files: processed 325221 lines.

SHUFFLING COOCCURRENCES
array size: 255013683
Shuffling by chunks: processed 325221 lines.
Wrote 1 temporary file(s).
Merging temp files: processed 325221 lines.

TRAINING MODEL
Read 325221 lines.
Initializing parameters...done.
vector size: 500
vocab size: 1284
x_max: 10.000000
alpha: 0.750000
iter: 001, cost: nan
iter: 002, cost: nan
iter: 003, cost: nan
iter: 004, cost: nan
iter: 005, cost: nan
iter: 006, cost: nan
iter: 007, cost: nan
iter: 008, cost: nan
iter: 009, cost: nan
iter: 010, cost: nan
iter: 011, cost: nan
iter: 012, cost: nan
iter: 013, cost: nan
iter: 014, cost: nan
iter: 015, cost: nan

Answer 1 · 2020-12-16T23:21:50.000Z

Is it possible to share the data files with us?

Answer 2 · 2020-12-17T17:23:13.000Z

Thanks. I can probably work on this next week or the week after. Was there a reason you closed the bug?

…

On Thu, Dec 17, 2020 at 9:13 AM Garrett Yoon ***@***.***> wrote: Closed #183 <#183>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#183 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWNYKDVKOLXMF7TWCL3SVI35NANCNFSM4U6XATSQ> .

Answer 3 · 2020-12-17T17:29:26.000Z

Hi there,

No worries about it. I think I figured out the issue.

Answer 4 · 2020-12-18T01:13:27.000Z

What is it?

…

On Thu, Dec 17, 2020, 9:29 AM Garrett Yoon ***@***.***> wrote: Hi there, No worries about it. I think I figured out the issue. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#183 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWO6MZVGDE7W5LQXDLTSVI5YNANCNFSM4U6XATSQ> .