The Document Transformation Application (DTA) is a microservice based web service application provided by the research group QDS of the TU Berlin.
The DTA serves as a playground for different research topics and provides a collection of different document transformation functions. A DocumentTransformationFunction DTF transforms a document into another document. Transforming in our context is meant in a very broad sense. Actually the semantics defined is, that one document, represented as a byte array, is passed as a argument to the DTF and another array of bytes is returned.
The protocol between a DTA user and the DTA Server is defined using gRPC and we call it DTA Server Protocol. A DTA worker might also act as a gateway to
The server protocol is defined using gRPC and consists of three operations
- TransformDocument
- ListServices
- TransformPipe
We have build the protocol using the protobuf v3.9.1 tool.
In addition we have defined a compatible RESTfull/JSON based API according to the following REST specification
The architecture is pretty simple as shown in
A Client communicates with a DTA Server via gRPC or RESTfull/JSON. The DTA Server can communicate with other DTA Servers and will return the result to the DTA client.
If a DTA server simply enables the communication with other, potentially private DTA servers we call this a DTA gateway.
Currently the project provides implementations for the following elements
- Gateway A simple, straigth forward gateway implementation
-
Count Counting lines, words, bytes in a document
-
Echo Just echoing the provided document
-
Html2text Extracts from a HTML document the text in markdown form. Preserves table structures.
Installation is pretty straight forward if you have a working golang environment.
go get github.com/theovassiliou/doctrans
go build ./...
go test ./...
If the output looks reasonable you are ready to go.
To test run a client/server pair try out the following
go run services/qds_echo/echo.go
and in another terminal on the same host
go run clients/client.go test/testDoc.txt
client
sends the file testDoc.txt
to the echo server which has been started before. Addressing is hardcoded via the default parameters.
If you start in a third terminal on the same host an additional server with
go run services/qds_count/count.go
This would start the count server, listening on the next available port. In order to use this service you could use now
go run clients/client.go -g :50052 test/testDoc.txt
For detailed configurations consult the respective client and server READMEs
- DTA - Document Transformation Application
- DTA client - Synonym for DTA user.
- DTA gateway - A DTA gateway offers via the DTA server API access to non-publicly available DTA
- DTA server - The DTA server provides an API for document transformation. The DTA server might use DTA worker to perform the task, or other means. See also DTA Gateway
- DTA server protocol - The protocol between DTA server and DTA user.
- DTA user - Is a entity that uses the DTA Server API to transform a document. Also called DTA client.
- DTA worker - A microservice providing one transformation function, potentially parametrised. A DTA worker is also a DTA server.
- DTF - Document Transformation Function is a function that tranforms a document into another document. Simple example include, ECHO (the null transformation) or COUNT (Counting lines, words and/or characters), while more suffistacted functions might convert a PDF to a text document.