/DataProcessor

General purpose data extractor to pre-process large datasets

Primary LanguagePython

DataProcessor

A general purpose python based data pre-processing class that utilises muti-processesing
This class will walk through a nested directory tree and pull out any files that meet a set criteria and pass the file_path to the data_extract function. This function (as well as the save_data function) can be customised to suit your needs. The class creates a "new_file_path" by replicating the source directory tree in the specified save directory.