MultiSafepay/.Net

Required suggestion regarding document parser

Closed this issue · 2 comments

Hi everyone, I've got a requirement where I need to programmatically search for a string / text in all kinds of documents which includes pdf, doc, docx, xml, txt, rtf, odt, excel files. There are currently many third party tools for parsing the files like aspose, lucene and so on.

I'm actually looking forward to develop a tool of my own. I'm looking forward for any kind of suggestion from anyone who has come across similar requirement.

Any help from anyone will be appreciated.

Thanks in Advance,
Srujan Panda.

You could build a tool which reads pdf/doc/xml and converts it to a string format. Parse it by creating a simple package which makes use of text.Split. Depending on your requirements you can extract any information in that file.

Closing old issue