This document is dedicated to providing a word2vec model developed for Farsi poems of 48 poets. The complete list of poets and their documents are available here.
Traditional Farsi poems are a set of sentence rows knowen as beyts
. Each beyt is split into two mesra
s that could be thought of as clauses that complete each other and convey a single concept.
Most of the existing Farsi literature are composed in this framework of mesras and beyts. Here is an example of a Farsi poem by Rumi:
(beyt1 - mesra1)
Out beyond ideas of wrongdoing and right-doing,
(beyt1 - mesra2)
there is a field. I'll meet you there.
(beyt2 - mesra1)
When the soul lies down in that grass,
(beyt2 - mesra2)
the world is too full to talk about.
(beyt3 - mesra1)
Ideas, language, even the phrase each other
(beyt3 - mesra2)
doesn't make any sense.
The corpus that we developed the word2vec model on consists of 1,216,286 mesras of Farsi poems that can be thought of as 608,143 sentences. Moreover, this document consists of 8,102,119 words from which 148,588 are unique.
The developed word2vec model is accessible here.
Here are some examples of the results of this model:
گوش به چشم شبیه شنیده است: دیده
Which means:
Ear to eye is similar to heard to: Saw
مرد به زن شبیه آدم است: حوا
Which means:
Man to woman is similar to Adam to: Eve
Please note that the model capability is limited due to its data input size which is only 8 million words. Moreover, due to different format and vocabulary between Farsi poems and nowadays Farsi conversation, please be carful in generalizing the model.