CSVsniffer

DOI

This repository has been conceived to reproduce the experiments described in the paper:

Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference (PDF)

by W. García.

An application of the methodology described in the paper can be found in the CSV interface repository.

Introduction

The results from the research can be reproduced by running the RunTests method from the macro-enabled Excel workbook CSVsniffer.xlsm. To review the results for CleverCSV it is necessary to run the scripts contained in the clevercsv_test.py file.

Data

The CSV folder contains the files copied from the Pollock framework and other collected test files. The expect configuration for each CSV tested is saved in the DialectConf.txt file, new files can be added.

Requirements

Below are the requirements for reproducing the experiments.

  • Microsoft Office Excel.
  • CleverCSV and all its dependencies.