nft-csv

nf-test plugin to provide support for CSV files based on tablesaw.

Requirements

nf-test version 0.7.0 or higher

Setup

To use this plugin you need to activate the nft-csv plugin in your nf-test.config file:

config {
  plugins {
    load "nft-csv@0.1.0"
  }
}

Usage

nft-csv extends path by a csv property that can be used to parse csv files. The csv property could be configured with different options.

Example:

def csvFile = path("file.csv").csv

def tabFile = path("file.tab").csv(sep: "\t")

def CsvFile = path("file.csv").csv(sep: "\t", quote: '"')

def gzCsvFile = path("file.csv.gz").csv(decompress: true)

def csvFile = csv("https://raw.githubusercontent.com/jtablesaw/tablesaw/master/data/bush.csv")

Available options:

header: When true, the first line is used as the columns names. Otherwise, the columns are named "C0", "C1", ... (default: true).
sep: The character used to separate values (default: ,)
quote: The character used to quote values (default: "").
decompress: When true, decompress the content using the GZIP format before processing it (default: false). Files with the .gz extension are NOT YET decompressed automatically.

The result is an object of class TableWrapper that contains the following properties anf methods:

`rowCount`

Returns the number of rows in the table.

Examples:

assert path("file.csv").csv.rowCount == 4

//or
with( path("file.tab").csv(sep: "\t")) {
    assert rowCount == 4
}

`columCount`

Returns the number of columns in the table.

Examples:

assert path("file.csv").csv.columCount == 3

//or
with( path("file.csv").csv) {
    assert columCount == 3
}

`columnNames`

Returns the names of the columns in the table.

Examples:

def csvFile = path("file.csv").csv
assert pcsvFile.columnNames == ["col_a", "col_b", "col_c"]
assert "col_b" in csvFile.columnNames
assert "lukas" !in csvFile.csv.columnNames

//or
with( path("file.csv").csv) {
    assert columnNames == ["col_a", "col_b", "col_c"]
    assert "col_b" in columnNames
    assert "lukas" !in columnNames
}

`rows`

Returns the rows of the table as a list of maps. Each map represents a row with column names as keys.

Examples:

with (path("file.csv").csv) {
    assert rows[1] == ["col_a": 4, "col_b": 5, "col_c": 6]
    assert rows[1] == ["col_c": 6, "col_a": 4, "col_b": 5]
}

`columns`

Returns the columns of the table as a map of lists. Each entry in the map represents a column with the column name as the key.

Examples:

assert columns["col_b"] == [2, 5 ,8, 11]
assert columns["col_b"] != [5, 2 ,8, 11]

`sort()`

Sorts the columns and rows of the table based on their natural order. This makes it possible to compare tables with a different order of columns and rows with expected table data. Examples:

def sortedTable1 = path("file1.csv").csv.sort()

`sortRows()`

Sorts the rows of the table based on all columns in their natural order.

Examples:

def sortedTable1 = path("file1.csv").csv.sortRows()

`sortRows(String columnName, boolean ascending)`

Sorts the rows of the table based on the values in the specified column.

columnName: The column to sort by.
ascending: true for ascending order, false for descending. (optional)

Examples:

def sortedTable1 = path("file1.csv").csv.sortRows("col_b").sortRows("col_a", false)

`sortColumns()`

Sorts the columns of the table alphabetically by their names in ascending order.

def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]

def csvFile2 = path("file.csv").csv.sortColumns()
assert csvFile2.columnNames == ["col_a", "col_b", "col_c"]

`sortColumns(boolean ascending)`

Sorts the columns of the table alphabetically by their names.

ascending: true for ascending order, false for descending.

Examples:

def csvFile = path("file.csv").csv
assert csvFile.columnNames == ["col_c", "col_a", "col_b"]

def csvFile2 = path("file.csv").csv.sortColumns(false)
assert csvFile2.columnNames == ["col_c", "col_b", "col_a"]

with(path("file.csv").csv.sortColumns(false)) {
    assert columnNames == ["col_c", "col_b", "col_a"]
}

`view()`

Prints the strucuture and number of rows and columns.

path("file.csv").csv.view()

`table`

Returns the tablesaw table instance and allows you to use all the methods to shape, merge and filter your data.

Examples:

//print table structure
print path(filename).csv.table.structure()

Most tablesaw operations create a new table object. You could use the csv function to create a TableWrapper object to use its methods.

Examples:

 def filename = process.out.csv.get(0)
with(path(filename).csv){
    assert columnNames == ["col_a", "col_b", "col_c"]
    assert rowCount == 4
    assert columnCount == 3
}
def table = csv(path(filename).csv.table.select("col_c", "col_a"))

with(table){
    assert columnNames == ["col_c", "col_a"]
    assert rowCount == 4
    assert columnCount == 2
}

Assertions

This plugin provides two assertions to simplify the comparison of tables containing numbers, using a specified precision.

`assertTableEquals(array1, array2, double precision)`

This could be used to compare tables where columns with numbers use the default precision of 0.00001.

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")
    def actual = csv("${outputDir}/scores.txt")
    assertTableEquals actual, expected
}

A user defined precision can be provided:

then {
    assertTableEquals actual, expected, 0.000000001
}

The order of rows and columns has to be same between the tables. However, you could combine it with sortColumns(), sortRows() or sort() to normalize the files:

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt").sort()
    def actual = csv("${outputDir}/scores.txt").sort()
    assertTableEquals actual, expected, 0.00001
}

`assertArrayEquals(array1, array2, double precision)`

This could be used to compare arrays containing numbers with precision.

Examples:

then {
    def expected = csv("tests/data/input/chr20-unphased/scores.expected.txt")

    def actual = csv("${outputDir}/scores.txt")
    with(actual) {
        assert columnNames == ["sample", "PGS000027"]
        assert columns["sample"] == expected.columns["sample"]
        assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"]
        // or with user defined precision
        assertArrayEquals columns["PGS000027"], expected.columns["PGS000027"], 0.0000001
    }
}

Contact

Lukas Forer (@lukfor)

lukfor/nft-csv

nft-csv

Requirements

Setup

Usage

rowCount

columCount

columnNames

rows

columns

sort()

sortRows()

sortRows(String columnName, boolean ascending)

sortColumns()

sortColumns(boolean ascending)

view()

table

Assertions

assertTableEquals(array1, array2, double precision)

assertArrayEquals(array1, array2, double precision)

Contact

`rowCount`

`columCount`

`columnNames`

`rows`

`columns`

`sort()`

`sortRows()`

`sortRows(String columnName, boolean ascending)`

`sortColumns()`

`sortColumns(boolean ascending)`

`view()`

`table`

`assertTableEquals(array1, array2, double precision)`

`assertArrayEquals(array1, array2, double precision)`