Aspose format conversion library

The library is intended to process significant data items between various format layouts. Initially it's made for XML (schema is here) and custom binary file (see BinaryFormatSerializationData).

Architecture

In general, the idea was to maximally hide the implementation from the user and work heavily with abstractions to ease the development process and provide extensibility.

All format processors which implement features for data manipulation are populated by main CommonFormatConverter class (using CreateFormatProcessor method which returns IFormatProcessor). Also, it has high-level Convert method to transofrm data between formats in case user doesn't need any intermidiate manipulation. It implements ICommonFormatConverter interface, which is hidden from the user and used for development process only. Format processors are initialized using reflection in the static constructor of CommonFormatConverter to decouple from their implementation during development.

Each format processor implements IFormatProcessor interface, through wich users are interacting with an instance: manipulating data and processing files. Data items are implementing IFormatDataItem interface and can be accessed by indexer, IEnumerable or Data collection; collection is processed with SetData, AddNewDataItem, AddItem,RemoveDataItem and ClearData methods. Base class that contains general features for a typical format processor is FormatProcessorBase.

Separate data item manipulation is provided through IFormatDataItem interface, which gives ability for the user to set date, brand name and price. FormatDataItem class implements IFormatDataItem and is used in FormatProcessorBase.

Pros

  1. Extensibility. Each new format processor can be easily added by inheriting from the FormatProcessorBase, which already has most of needed functionality - you only need to implement format-specific features.
  2. Abstractions are properly separated from the implementation.
  3. Encapsulation. Previous pro gives developer an abilty to give public access only to the CommonFormatConverter, leaving everything else for interfaces, which is good for class libraries.
  4. Unit testing. It covers almost 90% of code, and is already implemented to cover all basic format processor features through iteration of valid formats in test cases, so adding another format processor won't significantly decrease code coverage.

Cons

  1. Each format processor implementation is in the same project as the others, including base classes and abstractions, which may lead to a sort of a mess if there are dozens of formats.

Sample code

Simple usage demonstration can be found here.

Future development

  1. Separate base classes and abstractions from format processors with different projects for more convinient development process, where each format processor has it's own project.
  2. Find a way to validate and parse binary files with some sort of schema (since they are widely used and heavily customized). BeeSchema has bugs right now, but is still in development process, so it may be used in the future.
  3. Implement features that will allow library to work not only with file paths, but streams etc. for reading and writing data on the user side (though, in the end, writing is already performed to a stream now).