jlawrie/opencsv

dramatic performance degradation from version 3.7

Opened this issue · 0 comments

When we upgraded to version 4.1 (and also replicated on latest version) , we suddenly had a dramatic time increase when converting our Pojos (even simple ones) to CSV.
The reason that we upgraded the version was to have an easier ability to translate epoch to human date,
however the issue is replicated even without doing any type of conversion at all.

the conversion code is quite simple:

`package com.performance.test.openCsv;

import com.google.common.base.Stopwatch;

import com.opencsv.bean.HeaderColumnNameTranslateMappingStrategy;
import com.opencsv.bean.StatefulBeanToCsv;
import com.opencsv.bean.StatefulBeanToCsvBuilder;

import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;

/**

  • Created by uriel on 12/26/2018.
    */
    public class Main {

    public static final int POJOS_COUNT = 100*1000;
    static HeaderColumnNameTranslateMappingStrategy mappingStrategy;

    static {

     HeaderColumnNameTranslateMappingStrategy<Pojo> mappingStrategy =
             new HeaderColumnNameTranslateMappingStrategy();
     mappingStrategy.setType(Pojo.class);
    

    }

    public static void main(String[] args) {

     List<Pojo> pojos = new ArrayList<>();
     IntStream.range(0, POJOS_COUNT).forEach(i -> {
         Pojo pojo = new Pojo();
         pojo.setFirstName("Dan");
         pojo.setLastName("Boom");
         pojo.setCity("Tel Aviv");
         pojo.setCountry("Israel");
         pojo.setZipCode("12819839213");
         pojo.setAge(i);
         pojo.setHour(POJOS_COUNT-i);
         pojo.setValue(true);
         pojos.add(pojo);
     });
    
     Collection<Long> times = new ArrayList<>();
    
     IntStream.range(0, 10).forEach(i -> {
         Stopwatch stopwatch = Stopwatch.createStarted();
         String csv = createCsv(pojos);
         long elapsedTime = stopwatch.elapsed(TimeUnit.SECONDS);
         times.add(elapsedTime);
         System.out.println(csv);
     });
    
     System.out.println("====================================================================================================================");
     times.forEach(t -> System.out.println(String.format("Creating CSV took %d seconds", t)));
     System.exit(0);
    

    }

    public static String createCsv(Collection beansList) {
    final StringWriter stringWriter = new StringWriter();

     StatefulBeanToCsv<Pojo> csvCreator = new StatefulBeanToCsvBuilder(stringWriter)
             .withMappingStrategy(mappingStrategy)
             .withSeparator(',')
             .build();
     try {
         csvCreator.write(new ArrayList<>(beansList));
         stringWriter.close();
     } catch (Exception e) {
         throw new RuntimeException("failed to create csv ");
     }
     return stringWriter.toString();
    

    }
    }
    `

The time it takes to parse 100K entries (with no conversions at all) to CSV is around 10 seconds on a 4 core modern instance, in the newer releases.
In older releases this takes substantially less, around 1-2 seconds.

i'm uploading the jar content,
and would be really happy to see that we're doing something wrong 😅

Thanks!

open-csv-performance.zip