/CSVDataSet

CSVDataSet is a library to manipulate a DataSet stored in CSV files for Java language.

Primary LanguageJavaMIT LicenseMIT

CSVDataSet

CSVDataSet is a library to manipulate a DataSet stored in CSV files for Java language.

Versions:

  • 1.3 - Replace Column and Row operation
  • 1.4 - Skip quotes and count values algorithm.
  • 2.0 - Refactoring of the library, adding rows labels

Authors:

Jadson Santos - jadsonjs@gmail.com

Dependencies

Java 11
Gradle 5.2.1
Junit 5.6.0

How do I get set up?

From the source code:

Clone the project -> Import it as a gradle project on your IDE.

From the binary:

CSVDataSet has a binary distribution on libs/csvdataset-X.Y.jar directory.

Include it on your classpath.

How to use

You can build your dataset from rows or columns. We can have the first column as column headers and the first row as rows headers in the dataset.

Headers are labels that are not process when you calculate some algorithm over the dataset.

# You can create a CSV  by row and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.addRow(  Arrays.asList( new String[]{ " ",  "Column0", "Column1", "Column2", "Column3"})    );
dataSet.addRow(  Arrays.asList( new String[]  {"Row0",   "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row1",   "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row2",   "9",       "10",     "11",       "12"} )   );

dataSet.storeData();

#this code you save a temp.csv file in current directory with the content:
#  "", Column1,Column2,Column3,Column4
#  Row0,1,2,3,4                
#  Row1,5,6,7,8
#  Row2,9,10,11,12

# You can create a CSV  by row just with column headers and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv", true, false );

dataSet.addRow(  Arrays.asList( new String[]{ "Column0", "Column1", "Column2", "Column3"})    );
dataSet.addRow(  Arrays.asList( new String[]  { "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  { "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  { "9",       "10",     "11",       "12"} )   );

dataSet.storeData();

#this code you save a temp.csv file in current directory with the content:
#  Column1,Column2,Column3,Column4
#  1,2,3,4                
#  5,6,7,8
#  9,10,11,12

# You can create a CSV  by row just with rows headers and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv", false, true );

dataSet.addRow(  Arrays.asList( new String[]  {"Row0",   "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row1",   "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row2",   "9",       "10",     "11",       "12"} )   );

dataSet.storeData();

#this code you save a temp.csv file in current directory with the content:
#  Row0,1,2,3,4                
#  Row1,5,6,7,8
#  Row2,9,10,11,12

# You can create a CSV  by row without headers and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv", false, false );

dataSet.addRow(  Arrays.asList( new String[]  {"1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"9",       "10",     "11",       "12"} )   );

dataSet.storeData();

#this code you save a temp.csv file in current directory with the content:
#  1,2,3,4                
#  5,6,7,8
#  9,10,11,12

# You can also create a CSV  by columns. It is a way a little confused to visualize, but generated the same result

 CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.addColumn(  Arrays.asList(new String[]{ " ", "Row0", "Row1", "Row2", "Row3"}));
dataSet.addColumn(  Arrays.asList(new String[]{ "Column0", "1", "2", "3", "4"})      );
dataSet.addColumn(  Arrays.asList(new String[]{ "Column1", "5", "6", "7", "8"})      );
dataSet.addColumn(  Arrays.asList(new String[]{ "Column2", "9", "10", "11", "12"})   );
 
dataSet.storeData();

#this code you save a temp.csv file in current directory with the content:
#  "", Column1,Column2,Column3,Column4
#  Row0,1,2,3,4                
#  Row1,5,6,7,8
#  Row2,9,10,11,12

# You can load data form a CSV File

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.loadData();

# Delete the CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.deleteFile();
# After create a dataset, you can access columns and rows by label or positions

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.addRow(  Arrays.asList( new String[]{ " ",  "Column0", "Column1", "Column2", "Column3"})    );
dataSet.addRow(  Arrays.asList( new String[]  {"Row0",   "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row1",   "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row2",   "9",       "10",     "11",       "12"} )   );


System.out.println( dataSet.getColumnValues("Column0") )  // prints "1", "5", "9"
System.out.println( dataSet.getRowValues("Row1") )  // prints "5", "6", "7", "8"


System.out.println( dataSet.getColumnValues(0) )  // prints "1", "5", "9"
System.out.println( dataSet.getRowValues(1) )  // prints "5", "6", "7", "8"


# You can remove specific columns and rows

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.addRow(  Arrays.asList( new String[]{ " ",  "Column0", "Column1", "Column2", "Column3"})    );
dataSet.addRow(  Arrays.asList( new String[]  {"Row0",   "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row1",   "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row2",   "9",       "10",     "11",       "12"} )   );

dataSet.removeColumn(1);
dataSet.removeRow(0);

# Load CSV from file and calculating the sum of a column position "0" of CSV file (the second column after the rows headers)

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal sum = dataSet.sumColumn(0);

# Columns can be accessed by Position or by the Header Label
# Create CSV  and calculating the sum of a column with label "Column1" of CSV

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );

dataSet.addRow(  Arrays.asList( new String[]{ " ",  "Column0", "Column1", "Column2", "Column3"})    );
dataSet.addRow(  Arrays.asList( new String[]  {"Row0",   "1",       "2",       "3",        "4"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row1",   "5",       "6",       "7",        "8"} )   );
dataSet.addRow(  Arrays.asList( new String[]  {"Row2",   "9",       "10",     "11",       "12"} )   );

BigDecimal sum = dataSet.sumColumn("Column1"); // return sum == 18
BigDecimal sum2 = dataSet.sumColumn(1);        // return sum == 18


# Load CSV from file and Calculating the mean of a column position 10 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal mean = dataSet.meanColumn(10);

# Load CSV from file and Calculating the mean of a row position 10 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal mean = dataSet.meanRow(10);

# Load CSV from file and Calculating the mean of a column 20 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal median = dataSet.medianColumn(20);

# Load CSV from file and Calculating the mean of a row 15 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal median = dataSet.medianRow(15);

# Load CSV from file and Calculating the standard deviation of a column 100 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal stdDev = dataSet.stdDevColumn(100);

# Load CSV from file and Calculating the standard deviation of a row 100 of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
BigDecimal stdDev = dataSet.stdDevRow(100);

# Load CSV from file and normalizing the values column 1 
# and update the values a of CSV file and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> newColumnValues = dataSet.normalizeColumn(1, true);
dataSet.storeData();

# Load CSV from file and normalizing the values column "Column1"
# and update the values a of CSV file and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> newColumnValues = dataSet.normalizeColumn("Column1", true);
dataSet.storeData();

# Load CSV from file and normalizing the  values row 1 
# and update the values a of CSV file and save to file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> newRowValues = dataSet.normalizeRow(1, true);
dataSet.storeData();

# Load CSV from file and just normalizing the  values row 1 
# but not update the values a of CSV file

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> newRowValues = dataSet.normalizeRow(1, false);

# Load CSV from file and Get the values of a column 5

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> values = dataSet.getColumnValues(5);

# Load CSV from file and Get the values of a column by label

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<String> values = dataSet.getColumnValues("Column2");

# Load CSV from file and Get the values of a column 5 as a list of double values

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<Double> doubleValues = dataSet.getColumnValuesAsDouble(5);

# Load CSV from file and Get the values of a column 5 as a list of integer values

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<Integer> integerValues = dataSet.getColumnValuesAsInteger(5);

# Load CSV from file and Get the values of a column 5 as a list of BigDecimal values

CSVDataSet dataSet = new CSVDataSet( "temp.csv" );
dataSet.loadData();
List<BigDecimal> bigDecimalValues = dataSet.getColumnValuesAsBigDecimal(5);

# Creating a CSV in memory, Normalizing all values and after save to file.

CSVDataSet dataSet = new CSVDataSet( "temp.csv", true, false );

dataSet.setHeaders( Arrays.asList(new String[]{"Column1", "Column2", "Column3"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "100", "100", "100"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "50",   "50",  "50"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "11",   "11",  "11"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "5",     "5",   "5"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "40",   "40",  "40"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "140", "140", "140"}) );
dataSet.addRow(  Arrays.asList(new String[]{ "200", "200", "200"}) );

for (int positionColumn =0 ;  positionColumn < dataSet.getColumnsCount() ; positionColumn++){
    dataSet.normalizeColumn(positionColumn, true);
}

dataSet.storeData();

# Creating a CSV in memory, add and remove rows and columns, sum a column by label and save to file.

CSVDataSet dataSet = new CSVDataSet( "temp.csv", true, false  );
dataSet.setHeaders( Arrays.asList(new String[]{"Column1", "Column2", "Column3"}) );
dataSet.addRow(  Arrays.asList(new String[]{"1", "2", "3"})  );
dataSet.addRow(  Arrays.asList(new String[]{"4", "5", "6"}) );
dataSet.addRow(  Arrays.asList(new String[]{"7", "8", "9"}) );

dataSet.addColumn(  Arrays.asList(new String[]{"2.5", "5.5", "8.5"}), "Column2.5",  2  );
dataSet.removeColumn("Column1");

dataSet.addRow( Arrays.asList(new String[]{"100", "200", "300"}) );
dataSet.removeRow(0);

BigDecimal sum = dataSet.sumColumn("Column2.5") )

dataSet.storeData();

# we can sum/mean/median/stdDev the values of column "Column3" just where "Column5" value == true

CSVDataSet dataSet = new CSVDataSet( "temp.csv", true, false  );

dataSet.setHeaders( Arrays.asList(new String[]{"Column1", "Column2", "Column3", "Column4", "Column5"}) );
dataSet.addRow(  Arrays.asList(new String[]{"1", "2", "3", "4", "true"})   );
dataSet.addRow(  Arrays.asList(new String[]{"5", "6", "7", "8", "false"})    );
dataSet.addRow(  Arrays.asList(new String[]{"9", "10", "11", "12", "true"}) );

//  sumCol == 14
BigDecimal sumCol = dataSet.sumColumnByMatching("Column3", "Column5", "true");

How to run tests

Run gradlew test command.

Contribution guidelines

Be free to implement new algorithms or correct bugs and submit pull requests. Since, you write a correlated Unit Test that prove that your implementation is correct.