question: can I find a signature from text by this code?
mrmrn opened this issue · 4 comments
I have some texts from some authors. Each one has its own signature or link in the text.
For example author1:
text1:
sdsadsad daSDA DDASd asd aSD Sd dA SD ASD sadasdasds sadasd
@jhsad.sadas.com sdsdADSA sada
text2:
KDJKLFFD GFDGFDHGF GFHGFDHGFH GFHFGH Lklfgfd gdfsgfdsg df gfdhgf g
hfghghjh jhg @jhsad.sadas.com sfgff fsdfdsf
text3:
jhjkfsdg fdgdf sfds hgfj j kkjjfghgkjf hdkjtkj lfdjfg hkgfl
@jhsad.sadas.com dsfjdshflkds kg lsfdkg;fdgl
How can I find @jhsad.sadas.com
in the text?
EDIT:
@jhsad.sadas.com
is an example signature. I don't know what the real signatures of the authors might be! also it has not a format. it can be @jhsad.sadas.com
,or visit my blog in fsfsd.sfsf.dfssd
, or...
What I have is some text from the author and I know there is a unique signature from that author in their texts.
IDEA:
I thing with converting words to vectors and finding similarity between each texts, we can use cosine similarity to find the signatures.I thing the solution must be some thing like this idea.
The author's signature will in the following formats, throughout the text :
- @jhsad.sadas.com
- fsfsd.sfsf.dfssd
Is there any other possible formats to consider?
yes, for example we have a bunch of social media channel posts.
each channel has its own signature on its post.
how can we find the signature of a channel with evaluating 100 posts of a channel?
There is no magic way to extract out the signature. Even with ML you need to classify the contents of the message and identify which part is the signature.
I would recommend writing a set of rules or regexes based on the data you have available, to extract out the signatures.
thank you very much.