Project is trying to introduce a simple way how to map free form text on specific model class.
Parsing is getting text to parse through ITextSource
interface.
class TextSource : ITextSource
{
public IEnumerable<string> GetPagesText()
{
var page1String = "random text containing property: 5";
yield return page1String;
}
}
Parsing is done through TextDocumentParser<>.Parse
method. Generic is defininy model type into which text will be parsed.
var parsedDocument = new TextDocumentParser<MyModel>().Parse(new TextSource());
Console.WriteLine(parsedDocument.Property);
In this example we'd like to get an integer from the sample above into our model class.
[DeserializeByRegex("property: (?<Property>[0-9]+)")]
class MyModel
{
public int Property { get; set; }
}
Library allows to parse tables in the text, lets consider input:
Name | Value
property: 1
property: 2
property: 3
@Copyright xxx
class CollectionMyModel
{
[DeserializeCollectionByRegex("property", "Name | Value(.+)@Copyright")]
public MyModel[] MyModel { get; set; }
}
In this example DeserializeCollectionByRegex
attribute has two parameters.
- parameter is defining regex how to detect a row in the table (when
property
string is found, than it's new row. - parameter is defining how we should detect the table in a string. Everything within first group will be considered as text from table.
For successfull parsing is necessary to use also attribute DeserializeByRegex attribute annotating MyModel class, which will decide how row should be parsed.