A Suixin text consists of one or more items. Each item is formed by an ID and Content. Nothing fancy, just treat ID as Key and Content as Value.
Suixin use back quote (`) as the Mark char.
- A line that begins with 2 back quote = Line Comment.
- A line that begins with 1 back quote = ID.
- A line that begins with other char = Content.
``Line comment
`ID
Content
An ID is declared by putting the Mark char at the beginning of a line. The ID begins at the first character that is not a "Space" nor "Tab" after the Mark char. As such, the following are equivalent to each other. The ID ends when the line ends.
`ID
` ID
` ID
An ID can contain virtually any character, except the one that breaks the line of course.
` Macro$oft corporation
` !! ERROR !! : yes this is valid ID
` 結城友奈は勇者である
` ○△○ 1+1+1 > 3
If you decide to begin an ID with back quote somehow, you must put a space or "Tab" after the Mark char, otherwise it will be treat as comment.
` `weird but legal ID
``WRONG: Not an ID, but comment
It is legal to declare an ID without Content. To declare such ID, declare another ID right after it without anything that can be considered as Content. It is legal to have a Suixin text that contains ID only.
`` foo, foobar, barbar are ID without Content
` foo
` bar
hmm
` foobar
` barbar
A Content can have multiple lines. The Content begins at the first character after the ID. The Content ends at the last character before the line of next ID begins.
` Foo
foobar like to foobar
foobar is smart
be like foobar
` Bar
barfoo like to barbar
barbar is futile
avoid barbar
Suixin allows the use of blank lines to leave out some space between ID and content. Therefore, some blank lines will be ignored. The blank lines within the Content will be preserved.
`` Even before the first ID, you can have as much blank lines as you want.
` Foo
Foobar
Barfoo
` Bar
Foobar
Barfoo
`` The Content of "Foo" and "Bar" are same. Both are 3 lines.
This parser is designed to be a solid and flexible foundation for other framework to be built on. This means that it is fairly low level that it can hardly do anything useful out of the box, because it aims to let you build any kind of framework that operates on Suixin text on it.
First it will ask for a data pointer and byte count to operate on. Every time it encounters something interesting like an ID, it will pass the relevant pointer and byte count to the user-defined callback. Then the callback will return a value telling the parser what to do next. Continue? Abort? Or skip the Content that will be served soon? This is pretty much what does this parser do, to give the maximum control to user-defined callback on what to do with the data. As such...
This parser does not do any encoding conversion.
Whatever goes in, whatever comes out. If user passed in UTF-8 data pointer, user callback would get UTF-8 back. The parser will not try to do what it is not specialized in. It gives maximum flexibility to the user. User can choose to use the data as-is, or some external lib, or MultiByteToWideChar() in Windows.
This parser does not provide zero-terminated string to user-defined callback.
For efficiency, the parser reuse back the data pointer that user passed in, to avoid memory allocation that might be unnecessary. For example, the raw data is in UTF-8 and you want UTF-16 for calling Windows API. In this case, why would you want to allocate memory for UTF-8 zero-terminated string?
This parser does not detect if there is duplicated ID.
Keeping track of visited ID requires memory allocation, and not everyone want to block duplicated ID. Some would prefer to override the content of previous ID with the later one.
All of these can be easily implemented, as shown in the following example.
void SuixinParse(SuixinParseStat *, SuixinParseSpec const *);
Both parameter must be specified (non-NULL).
SuixinParseStat::error
Pretty straightforward. Refer theenum SuixinParseErr
in header file.
SuixinParseSpec::(iRow, iCol, iByte)
Zero-based index of row/col/byte that the parser worked on, beforeSuixinParse()
returned.
SuixinParseSpec::pData
Pointer to data for parser to parse on.
SuixinParseSpec::cbData
Byte count of data that parser should parse.
SuixinParseSpec::charIs16bit
By default, parser assumes multibyte encoding (like UTF-8) and operates on the basis ofuint8_t
.
To correctly handle Windows Unicode (UCS-2 or UTF-16) string, set this totrue
so the parser operates on the basis ofuint16_t
instead.
SuixinParseSpec::user
User-defined value that will be passed to user-defined callback.
SuixinParseSpec::pfnOnID
Must not NULL. User-defined callback.
(uint8_t const *pID, int cbID, void *user, void **ppItem)
pID
non-zero-terminated ID stringcbID
byte count of ID stringuser
user-defined value passed inSuixinParseSpec
ppItem
pointer to a void pointer that will be passed topfnOnContent
It is guaranteed that when this callback is called,
pID
will never be NULL andcbID
will always > 0.User should set
*ppItem
to non-zero value if current ID is in interest, otherwise the Content of current ID will be skipped.Return
ContinueSuixinParse
to continue, orAbortSuixinParse
to abort.
SuixinParseSpec::pfnOnContent
Must not NULL. User-defined callback.
(uint8_t const *pCtnt, int cbCtnt, void *user, void *pItem)
pCtnt
non-zero-terminated Content stringcbCtnt
byte count of Content stringuser
user-defined value passed inSuixinParseSpec
pItem
void pointer specified by user inpfnOnID
When this callback is called,
cbCtnt
will be zero if the item of current ID has no Content. Always checkcbCtnt
before attempting to accesspCtnt
.Return
ContinueSuixinParse
to continue, orAbortSuixinParse
to abort.