Note: This is not officially supported by Hitachi Vantara.
This library provides is a Ruby-based testing suite for Pentaho Data Integration. You can create specifications for Pentaho transformations and jobs then ensure they always run correctly.
This library was tested against:
- Kettle version 6.1.0.1-196
- MacOS and Linux
Note that it also is currently limited to:
- MySQL
- Amazon Simple Storage Service
Future enhancements potentially could include breaking these out and making them plug-ins in order to support other database and cloud storage vendors/systems.
To install through Rubygems:
gem install simmer
You can also add this to your Gemfile:
bundle add simmer
After installation, you will need do to two things:
- Add simmer configuration file
- Add simmer directory
The configuration file contains information about external systems, such as:
- Amazon Simple Storage Service
- Local File System
- Pentaho Data Integration
- MySQL Database
Copy this configuration template into your project's root to: config/simmer.yaml
:
# Automatically retry a test when it has failed this many times due to a timeout error:
timeout_failure_retry_count: 0
mysql_database:
database:
username:
host:
port:
flags: MULTI_STATEMENTS
spoon_client:
dir: spec/mocks/spoon
args: 0
# local_file_system:
# dir: tmp/store_test
# aws_file_system:
# access_key_id:
# bucket:
# default_expires_in_seconds: 3600
# encryption: AES256
# region:
# secret_access_key:
Note: You can configure any options for mysql_database
listed in the mysql2 gem configuration options.
Fill out the missing configuration values required for each section. If you would like to use your local file system then un-comment the local_file_system
key. If you would like to use AWS S3 then un-comment out the aws_file_system
key.
Note: There is a naming-convention-based protection to help ensure non-test database and file systems do not get accidentally wiped that you must follow:
- Database names must end in `_test'
- local file system dir must end in
-test
- AWS file system bucket must end in
-test
You will also need to create the following folder structure in your project's root folder:
- simmer/files: Place any files necessary to stage in this directory.
- simmer/fixtures: Place YAML files, that describe database records, necessary to stage the database.
- simmer/specs: Place specification YAML files here.
It does not matter how each of these directories are internally structured, they can contain folder structure in any arbitrary way. These directories should be version controlled as they contain the necessary information to execute your tests. But you may want to ignore the simmer/results
directory as that will store the results after execution.
A specification is a blueprint for how to run a transformation or job and contains configuration for:
- File system state before execution
- Database state before execution
- Execution command
- Expected database state after execution
- Expected execution output
The following is an example specification for a transformation:
name: Declassify Users
stage:
files:
src: noc_list.csv
dest: input/noc_list.csv
fixtures:
- iron_man
- hulk
act:
name: load_noc_list
repository: top_secret
type: transformation
params:
files:
input_file: noc_list.csv
keys:
code: 'The secret code is: {codes.the_secret_one}'
assert:
assertions:
- type: table
name: agents
records:
- call_sign: iron_man
first: tony
last: stark
- call_sign: hulk
first: bruce
last: banner
- type: table
name: agents
logic: includes
records:
- last: stark
- type: output
value: output to stdout
The stage section defines the pre-execution state that needs to exist before PDI execution. There are two options:
- Files
- Fixtures
Each file entry specifies two things:
- src: the location of the file (relative to the
simmer/files
) - dest: where to copy it to (within the configured file system: local or S3)
Fixtures will populate the database specified in the mysql_database
section of simmer.yaml
. In order to do this you need to:
- Add the fixture to a YAML file in the
simmer/fixtures
directory. - Add the name of the fixture you wish to use in the
stage/fixtures
section as illustrated above
Adding Fixtures
Fixtures live in YAML files within the simmer/fixtures
directory. They can be placed in any arbitrary file, the only restriction is their top-level keys that uniquely identify a fixture. Here is an example of a fixture file:
hulk:
table: agents
fields:
call_sign: hulk
first: CLASSIFIED
last: CLASSIFIED
iron_man:
table: agents
fields:
call_sign: iron_man
first: CLASSIFIED
last: CLASSIFIED
This example specifies two fixtures: hulk
and iron_man
. Each will end up creating a record in the agents
table with their respective attributes (columns).
The act configuration contains the necessary information for invoking Pentaho through its Spoon script. The options are:
- name: The name of the transformation or job
- repository: The name of the Kettle repository
- type: transformation or job
- file params: key-value pairs to send through to Spoon as params. The values will be joined with and are relative to the
simmer/files
directory. - key params: key-value pairs to send through to Spoon as params.
The assert section contains the expected state of:
- Database table contents
- Pentaho output contents
Take the assert block from the example above:
assert:
assertions:
- type: table
name: agents
records:
- call_sign: iron_man
first: tony
last: stark
- call_sign: hulk
first: bruce
last: banner
- type: table
name: agents
logic: includes
records:
- last: stark
- type: output
value: output to stdout
This contains two table and one output assertion. It explicitly states that:
- The table
agents
should exactly contain two records with the column values as described (iron_man and hulk) - The table
agents
should include a record where the last name isstark
- The standard output should contain the string described in the value somewhere in the log
Note: Output does not currently test the standard error, just the standard output.
Currently table assertions operate under a very rudimentary set of rules:
- Record order does not matter
- Each record being asserted should have the same keys compared
- All values are asserted against their string coerced value
- There is no concept of relationships or associations (yet)
After you have configured simmer and written a specification, you can run it by executing:
bundle exec simmer ./simmer/specs/name_of_the_spec.yaml
The passed in path can also be a directory and all specs in the directory (recursively) will be executed:
bundle exec simmer ./simmer/specs/some_directory
You can also omit the path altogether to execute all specs:
bundle exec simmer
It is possible to define custom test lifecycle hooks. These are very similar to Rspec. Here is an example of how to ensure that code called before and after the entire suite:
Simmer.configure do |config|
config.before(:suite) { puts 'about to run the entire suite' }
config.after(:suite) do |result|
result_msg = result.passed? ? 'passed' : 'failed'
puts "The suite #{result_msg}."
end
end
Not that after callbacks taken an optional parameter which is the result object.
It is also possible to specify custom code which runs before and after each individual specification.
Simmer.configure do |config|
config.before(:each) { puts 'I will run before each test' }
config.after(:each) do |result|
result_msg = result.passed? ? 'passed' : 'failed'
puts "The specification #{result_msg}."
end
end
Basic steps to take to get this repository compiling:
- Install Ruby (check simmer.gemspec for versions supported)
- Install bundler (gem install bundler)
- Clone the repository (git clone git@github.com:bluemarblepayroll/simmer.git)
- Navigate to the root folder (cd simmer)
- Install dependencies (bundle)
- Create the 'simmer_test' MySQL database as defined in
spec/db/tables.sql
. - Add the tables from
spec/db/tables.sql
to this database. - Configure your test simmer.yaml:
cp spec/config/simmer.yaml.ci spec/config/simmer.yaml
```
9. Edit `spec/config/simmer.yaml` so that it can connect to the database created in step seven.
### Running Tests
To execute the test suite and code-coverage tool, run:
````bash
bundle exec rspec spec --format documentation
Alternatively, you can have Guard watch for changes:
bundle exec guard
Also, do not forget to run Rubocop:
bundle exec rubocop
or run all three in one command:
bundle exec rake
Note: ensure you have proper authorization before trying to publish new versions.
After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:
- Merge Pull Request into master
- Update
lib/simmer/version.rb
using semantic versioning - Install dependencies:
bundle
- Update
CHANGELOG.md
with release notes - Commit & push master to remote and ensure CI builds master successfully
- Run
bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the.gem
file to rubygems.org.
Everyone interacting in this codebase, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
This project is MIT Licensed.