RUCAIBox/TextBox

Custom Dataset Preprocessing

Closed this issue · 5 comments

Does TextBox support custom preprocessing for a dataset?

I work with source code data, and such data may require custom preprocessing. For example, extracting abstract syntax tree, or dataflow graph.

Sorry, we don't support it at the moment. You can preprocess your data into a text-to-text format before feeding to TextBox.

Thanks for the prompt response.

It would be nice feature in the future if we can plug custom preprocessing in the pipeline :)

Thanks for your understanding. Could you tell us the task you are working on? Or a detailed dataflow pipeline?

My task is translating source code to source code.

The example use case is this.

got it!