Bend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use CSV (Formally FasterCSV), Spreadsheet or any array of array-like data structure.
Open a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.
require 'conformist'
require 'csv'
csv = CSV.open '~/transmitters.csv'
schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
column :longitude, 3, 4, 5
column :name, 0 do |value|
value.upcase
end
end
Insert the transmitters into a SQLite database.
require 'sqlite3'
db = SQLite3::Database.new 'transmitters.db'
schema.conform(csv).each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end
Only insert the transmitters with the name "Mount Cooth-tha" using ActiveRecord or DataMapper.
transmitters = schema.conform(csv).select do |transmitter|
transmitter.name == 'Mount Coot-tha'
end
transmitters.each do |transmitter|
Transmitter.create! transmitter.attributes
end
Source from multiple, different input files and insert transmitters together into a single database.
require 'conformist'
require 'csv'
require 'sqlite3'
au_schema = Conformist.new do
column :callsign, 8
column :latitude, 10
end
us_schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
end
au_csv = CSV.open '~/au/transmitters.csv'
us_csv = CSV.open '~/us/transmitters.csv'
db = SQLite3::Database.new 'transmitters.db'
[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema|
schema.each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end
end
Open a Microsoft Excel spreadsheet and declare a schema.
require 'conformist'
require 'spreadsheet'
book = Spreadsheet.open '~/states.xls'
sheet = book.worksheet 0
schema = Conformist.new do
column :state, 0, 1 do |values|
"#{values.first}, #{values.last}"
end
column :capital, 2
end
Print each state's attributes to standard out.
schema.conform(sheet).each do |state|
$stdout.puts state.attributes
end
For more examples see test/fixtures, test/schemas and test/unit/integration_test.rb.
Conformist is available as a gem. Install it at the command line.
$ [sudo] gem install conformist
Or add it to your Gemfile and run $ bundle install
.
gem 'conformist'
Anonymous schemas are quick to declare and don't have the overhead of creating an explicit class.
citizen = Conformist.new do
column :name, 0, 1
column :email, 2
end
citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]
Class schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.
class Citizen
extend Conformist
column :name, 0, 1
column :email, 2
end
Citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]
Column indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.
column :account_number # => 0
column :date { |v| Time.new *v.split('/').reverse } # => 1
column :description # => 2
column :debit # => 3
column :credit # => 4
Conform is the principle method for lazily applying a schema to the given input.
enumerator = schema.conform CSV.open('~/file.csv')
enumerator.each do |row|
puts row.attributes
end
#conform
expects any object that responds to #each
to return an array-like object.
CSV.open('~/file.csv').responds_to? :each # => true
[[], [], []].responds_to? :each # => true
#conform
takes an option to skip the first row of input. Given a typical CSV document,
the first row is the header row and irrelevant for enumeration.
schema.conform CSV.open('~/file_with_headers.csv'), :skip_first => true
Strings can be used as column indexes instead of integers. These strings will be matched against the first row to determine the appropriate numerical index.
citizen = Conformist.new do
column :email, 'EM'
column :name, 'FN', 'LN'
end
citizen.conform [['FN', 'LN', 'EM'], ['Tate', 'Johnson', 'tate@tatey.com']], :skip_first => true
#conform
is lazy, returning an Enumerator. Input is not parsed until you call #each
, #map
or any method defined in Enumerable. That means schemas can be assigned now and evaluated later. #each
has the lowest memory footprint because it does not build a collection.
The argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.
citizen[:name] # => "Tate Johnson"
citizen.name # => "Tate Johnson"
For convenience the #attributes
method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.
citizen.attributes # => {:name => "Tate Johnson", :email => "tate@tatey.com"}
Maps the first column in the input file to :first_name
. Column indexing starts at zero.
column :first_name, 0
Maps the first and second columns in the input file to :name
.
column :name, 0, 1
Indexing is completely arbitrary and you can map any combination.
column :name_and_city 0, 1, 2
Many columns are implicitly concatenated. Behaviour can be changed by passing a block. See preprocessing.
Sometimes values need to be manipulated before they're conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.
column :name, 0, 1 do |values|
values.map(&:upcase) * ' '
end
Works with one column too. Instead of getting a collection of objects, one object is passed to the block.
column :first_name, 0 do |value|
value.upcase
end
It's also possible to provide a context object that is made available during preprocessing.
citizen = Conformist.new do
column :name, 0, 1 do |values, context|
(context[:upcase?] ? values.map(&:upcase) : values) * ' '
end
end
citizen.conform [['tate', 'johnson']], context: {upcase?: true}
Virtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.
column :day do
1
end
Inheriting from a schema gives access to all of the parent schema's columns.
Anonymous inheritance takes inspiration from Ruby's syntax for instantiating new classes.
parent = Conformist.new do
column :name, 0, 1
end
child = Conformist.new parent do
column :category do
'Child'
end
end
Classical inheritance works as expected.
class Parent
extend Conformist
column :name, 0, 1
end
class Child < Parent
column :category do
'Child'
end
end
Where previously you had
class Citizen
include Conformist::Base
column :name, 0, 1
end
Citizen.load('~/file.csv').foreach do |citizen|
# ...
end
You should now do
require 'fastercsv'
class Citizen
extend Conformist
column :name, 0, 1
end
Citizen.conform(FasterCSV.open('~/file.csv')).each do |citizen|
# ...
end
See CHANGELOG.md for a full list of changes.
- MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3
- JRuby
No explicit dependencies, although CSV
and Spreadsheet
are commonly used.
- Fork
- Install dependancies by running
$ bundle install
- Write tests and code
- Make sure the tests pass locally by running
$ bundle exec rake
- Push to GitHub and make sure continuous integration tests pass at https://travis-ci.org/tatey/conformist/pull_requests
- Send a pull request on GitHub
Please do not increment the version number in lib/conformist/version.rb
.
The version number will be incremented by the maintainer after the patch
is accepted.
Motivation for this project came from the desire to simplify importing data from various government organisations into Antenna Mate. The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.
Copyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.