text / language processing
1xch opened this issue · 3 comments
I was connected to this project through neovintages blog articles I found in my search to understand machine learning and what I can do with it in ruby.
I'm interested doing text processing, and the one article specifically pointed this out for text processing -- but it seems to take only libsvm formats (which I'm still trying to learn) and to do text processing requires a bit more along the lines of sally(http://www.mlsec.org/sally) which I've also stumbled across and am integrating into what I'm trying to accomplish.
So, 2 question points:
1 - Is there some sort of text processing feature I'm unaware of? From what I saw there is not, but if there would be more examples would be appreciated.
2 - What is the exact data format(s)? I know it is libsvm data, but the example is sparse at best.
Hi there!
I'll do my best to answer your questions.
1 - Unfortunately, no. If you're trying to do text classification, you'll need to first convert the text into some kind of vector that can be represented by numerical values. Then from that point you can take the numerical vector and use rb-libsvm to do the classification.
2 - Libsvm does have a data format that you can use to import data from a file but in this project we don't have any convenience method that pulls the file in and creates the model.
As I suspected, thank you. I'm still learning.
To help out, I'll create an example doing text classification. I hope that will suffice.