GC limit overhead exceeded because of temporary objects
extstmtrifork opened this issue · 4 comments
Hi,
I am trying to read from a csv file containing a bit more than 2 million rows, then make a simple mapping to something i can use to finally insert it to a database. However, i am getting an erro: "GC limit overhead exceeded", as it creates a lot of temporary objects.
I read the other issue regarding temporary objects, however as i could understand, it is regarding writing to a csv file, but i am getting this error while reading from an csv file.
@extstmtrifork can you provide some sample code?
So basically i have two files in which a header called "PersonID" is common.
I read the first file and insert the data into a hashmap (se code below)
Then i read the second file where i use the hashmap to get the another header "CivilRegistrationNumber" based on "PersonID".
There are 14 headers (columns) in the second csv file in which all of them are strings.
I then use all the information to insert into a database
`
public Map<String, String> readingFileAtOnce(File file) throws IOException, InterruptedException {
Map<String, String> personMap = new HashMap<>();
CsvReader csvReader = new CsvReader();
csvReader.setContainsHeader(true);
csvReader.setTextDelimiter('\'');
csvReader.setSkipEmptyRows(true);
CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8);
CsvRow row;
boolean headersValidated = false;
while ((row = csvParser.nextRow()) != null) {
if (Thread.currentThread().isInterrupted())
throw new InterruptedException();
if (! headersValidated) {
dataValidator.validateHeadersExists(csvParser.getHeader(), Arrays.asList("PersonID", "CivilRegistrationNumber"));
headersValidated = true;
}
try {
dataValidator.validatePersonData(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
personMap.put(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
} catch (IllegalStateException e) {
error++;
log.error("...");
} catch (IllegalArgumentException e) {
error++;
log.error("...");
}
}
return personMap;
}
`
This sounds like a JVM tuning issue ... How much heap memory are you allocating to the JVM?
IMO this design doesn't scale well.
You would be better off ...
- Sorting both files by
PersonID
- Read a record from file 1.
- Read a record from file 2.
- Merge the records and write them to file 3.
Is the CSV file proper formatted? I know situations where missing (closing) text delimiters are resulting in huge column data.