This repository has been conceived to reproduce the experiments described in the paper:
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference (PDF)
by W. García.
An application of the methodology described in the paper can be found in the CSV interface repository.
The results from the research can be reproduced by running the RunTests
method from the macro-enabled Excel workbook CSVsniffer.xlsm
. To review the results for CleverCSV it is necessary to run the scripts contained in the clevercsv_test.py
file.
The CSV
folder contains the files copied from the Pollock framework and other collected test files. The expect configuration for each CSV tested is saved in the DialectConf.txt
file, new files can be added.
Below are the requirements for reproducing the experiments.
- Microsoft Office Excel.
- CleverCSV and all its dependencies.