/columns-matcher

Ruby gem: Given an hash of possibles header label find the correct position of a column in the real header. Useful when you don't know the structure of a spreadsheet.

Primary LanguageRuby

Column Matcher Build Status

When you work with spreadsheets and casual users things do not always go as you want.

I worked on a project where users have to upload a spreadsheet with a lot of data about clients purchases. After 3 upload they call me because the system "did not work well".

The files look like these followiong.

File 1:

Name Surname Emails
John Smith john.smith@gmail.com
John Doe john.doe@hotmail.com

File 2:

Surname Name E-mail
Smith John john.smith@gmail.com
Doe John john.doe@hotmail.com

File 3:

Mail Nombre Apellido
john.smith@gmail.com John Smith
john.doe@hotmail.com John Doe

3 files, 3 different structures. 3 different headers. WTF! How can i guess the position of the columns i'm looking for?

This gem try to solve the problem

Install

Add to your Gemfile and run the bundle command to install it.

gem "columns-matcher"

N.B. Requires Ruby 1.9.2 or later.

Use

require "columns-matcher"

@matcher = ColumnsMatcher::Matcher.new

# the column that contains the name can be lebeled with "NAME", "NOME" or "NOMBRE"
@matcher.add_column("name", ["NAME", "NOME", "NOMBRE"])

# the column that contains the surname can be lebeled with "SURNAME", "COGNOME" or "APELLIDOS"
@matcher.add_column("cognome", ["SURNAME", "COGNOME", "APELLIDOS"])

# We suppose the header is ["COGNOME", "NOME", "INDIRIZZO"]
@matcher.set_header(header_loaded_from_spreadsheet)

@matcher.column_of("name") # return 1
@matcher.column_of("cognome") # return 0

First try is as exact match. If does not work it try with different case:

require "columns-matcher"

@matcher = ColumnsMatcher::Matcher.new

# the column that contains the name can be lebeled with "NAME", "NOME" or "NOMBRE"
@matcher.add_column("name", ["name", "nome", "nombre"])

# We suppose the header is ["APELLIDO", "NOMBRE", "ADDRESS"]
@matcher.set_header(header_loaded_from_spreadsheet)

@matcher.column_of("name") # return 1

If I can't find the column with exact match or different case match i can also use reg exp

require "columns-matcher"

@matcher = ColumnsMatcher::Matcher.new

# the column that contains the name can be lebeled with "NAME", "NOME" or "NOMBRE"
@matcher.add_column("email", ["[Ee]?[\-]*mail[s]*"])

# We suppose the header is ["Surname", "Name", "Emails"]
@matcher.set_header(header_loaded_from_spreadsheet)

@matcher.column_of("email") # return 2