/conformist

Bend CSVs to your will with declarative schemas.

Primary LanguageRubyMIT LicenseMIT

Conformist

Build Status Code Climate

Bend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use CSV (Formally FasterCSV), Spreadsheet or any array of array-like data structure.

Quick and Dirty Examples

Open a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.

require 'conformist'
require 'csv'

csv    = CSV.open '~/transmitters.csv'
schema = Conformist.new do
  column :callsign, 1
  column :latitude, 1, 2, 3
  column :longitude, 3, 4, 5
  column :name, 0 do |value|
    value.upcase
  end
end

Insert the transmitters into a SQLite database.

require 'sqlite3'

db = SQLite3::Database.new 'transmitters.db'
schema.conform(csv).each do |transmitter|
  db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end

Only insert the transmitters with the name "Mount Cooth-tha" using ActiveRecord or DataMapper.

transmitters = schema.conform(csv).select do |transmitter|
  transmitter.name == 'Mount Coot-tha'
end
transmitters.each do |transmitter|
  Transmitter.create! transmitter.attributes
end

Source from multiple, different input files and insert transmitters together into a single database.

require 'conformist'
require 'csv'
require 'sqlite3'

au_schema = Conformist.new do
  column :callsign, 8
  column :latitude, 10
end
us_schema = Conformist.new do
  column :callsign, 1
  column :latitude, 1, 2, 3
end

au_csv = CSV.open '~/au/transmitters.csv'
us_csv = CSV.open '~/us/transmitters.csv'

db = SQLite3::Database.new 'transmitters.db'

[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema|
  schema.each do |transmitter|
    db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
  end
end

Open a Microsoft Excel spreadsheet and declare a schema.

require 'conformist'
require 'spreadsheet'

book   = Spreadsheet.open '~/states.xls'
sheet  = book.worksheet 0
schema = Conformist.new do
  column :state, 0, 1 do |values|
    "#{values.first}, #{values.last}"
  end
  column :capital, 2
end

Print each state's attributes to standard out.

schema.conform(sheet).each do |state|
  $stdout.puts state.attributes
end

For more examples see test/fixtures, test/schemas and test/unit/integration_test.rb.

Installation

Conformist is available as a gem. Install it at the command line.

$ [sudo] gem install conformist

Or add it to your Gemfile and run $ bundle install.

gem 'conformist'

Usage

Anonymous Schema

Anonymous schemas are quick to declare and don't have the overhead of creating an explicit class.

citizen = Conformist.new do
  column :name, 0, 1
  column :email, 2
end

citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]

Class Schema

Class schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.

class Citizen
  extend Conformist

  column :name, 0, 1
  column :email, 2
end

Citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]

Implicit Indexing

Column indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.

column :account_number                              # => 0
column :date { |v| Time.new *v.split('/').reverse } # => 1
column :description                                 # => 2
column :debit                                       # => 3
column :credit                                      # => 4

Conform

Conform is the principle method for lazily applying a schema to the given input.

enumerator = schema.conform CSV.open('~/file.csv')
enumerator.each do |row|
  puts row.attributes
end

Input

#conform expects any object that responds to #each to return an array-like object.

CSV.open('~/file.csv').responds_to? :each # => true
[[], [], []].responds_to? :each           # => true

Header Row

#conform takes an option to skip the first row of input. Given a typical CSV document, the first row is the header row and irrelevant for enumeration.

schema.conform CSV.open('~/file_with_headers.csv'), :skip_first => true

Named Columns

Strings can be used as column indexes instead of integers. These strings will be matched against the first row to determine the appropriate numerical index.

citizen = Conformist.new do
  column :email, 'EM'
  column :name, 'FN', 'LN'
end

citizen.conform [['FN', 'LN', 'EM'], ['Tate', 'Johnson', 'tate@tatey.com']], :skip_first => true

Enumerator

#conform is lazy, returning an Enumerator. Input is not parsed until you call #each, #map or any method defined in Enumerable. That means schemas can be assigned now and evaluated later. #each has the lowest memory footprint because it does not build a collection.

Struct

The argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.

citizen[:name] # => "Tate Johnson"
citizen.name   # => "Tate Johnson"

For convenience the #attributes method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.

citizen.attributes # => {:name => "Tate Johnson", :email => "tate@tatey.com"}

One Column

Maps the first column in the input file to :first_name. Column indexing starts at zero.

column :first_name, 0

Many Columns

Maps the first and second columns in the input file to :name.

column :name, 0, 1

Indexing is completely arbitrary and you can map any combination.

column :name_and_city 0, 1, 2

Many columns are implicitly concatenated. Behaviour can be changed by passing a block. See preprocessing.

Preprocessing

Sometimes values need to be manipulated before they're conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.

column :name, 0, 1 do |values|
  values.map(&:upcase) * ' '
end

Works with one column too. Instead of getting a collection of objects, one object is passed to the block.

column :first_name, 0 do |value|
  value.upcase
end

It's also possible to provide a context object that is made available during preprocessing.

citizen = Conformist.new do
  column :name, 0, 1 do |values, context|
    (context[:upcase?] ? values.map(&:upcase) : values) * ' '
  end
end

citizen.conform [['tate', 'johnson']], context: {upcase?: true}

Virtual Columns

Virtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.

column :day do
  1
end

Inheritance

Inheriting from a schema gives access to all of the parent schema's columns.

Anonymous Schema

Anonymous inheritance takes inspiration from Ruby's syntax for instantiating new classes.

parent = Conformist.new do
  column :name, 0, 1
end

child = Conformist.new parent do
  column :category do
    'Child'
  end
end

Class Schema

Classical inheritance works as expected.

class Parent
  extend Conformist

  column :name, 0, 1
end

class Child < Parent
  column :category do
    'Child'
  end
end

Upgrading from <= 0.0.3 to >= 0.1.0

Where previously you had

class Citizen
  include Conformist::Base

  column :name, 0, 1
end

Citizen.load('~/file.csv').foreach do |citizen|
  # ...
end

You should now do

require 'fastercsv'

class Citizen
  extend Conformist

  column :name, 0, 1
end

Citizen.conform(FasterCSV.open('~/file.csv')).each do |citizen|
  # ...
end

See CHANGELOG.md for a full list of changes.

Compatibility

  • MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3
  • JRuby

Dependencies

No explicit dependencies, although CSV and Spreadsheet are commonly used.

Contributing

  1. Fork
  2. Install dependancies by running $ bundle install
  3. Write tests and code
  4. Make sure the tests pass locally by running $ bundle exec rake
  5. Push to GitHub and make sure continuous integration tests pass at https://travis-ci.org/tatey/conformist/pull_requests
  6. Send a pull request on GitHub

Please do not increment the version number in lib/conformist/version.rb. The version number will be incremented by the maintainer after the patch is accepted.

Motivation

Motivation for this project came from the desire to simplify importing data from various government organisations into Antenna Mate. The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.

Copyright

Copyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.