pkiraly/metadata-qa-api

Separate record serialization from internal representation

Closed this issue · 8 comments

At the moment, the input and output of the measure function are serialize strings (see for instance

).

Could it be made possible to also pass an internal (java) representation instead (simplest example would be an array)? This would simplify extensions such as reading from and writing to a RDBMS. Now, you would have to (de)serialize a record to a CSV string.

@mielvds

I've created two additional functions for the CalculatorFacade:

  • List<String> measureAsList(String jsonRecord) throws InvalidJsonException
  • Map<String, Object> measureAsMap(String jsonRecord) throws InvalidJsonException

Please check if they do what you expected.

@pkiraly thanks for this; it makes this already more flexible. However, it's more pressing on the input side.

Something like List<String> measureAsList(List<String> record) throws InvalidJsonException

Because of multiline columns, I have to use opencsv outside of this library to parse the CSV correctly into individual records. With the current API, I have to serialize the resulting String[] array back to into CSV string just to have it parsed again by your library. The multilines make this very hard to do.

That said, this wouldn't work for JSON though, because it's not flat. Maybe it's easier to implement CSV as wrapper to JSON? In the wrapper class, you could safely create methods that accept List<String> / String[]. Under the hood, everything is translated to JSON or maybe an internal Object data model, which would keep all serialization/deserialization out of the core components.

@mielvds

I am aware of the problem. I haven't thought about the suggestion you made. I will implement it. Meantime, here is a solution based on opencsv:

CSVIterator iterator = new CSVIterator(new CSVReaderHeaderAware(new FileReader(file)));
while (iterator.hasNext()) {
  String line = CsvReader.toCsv(iterator.next());
  List<String> metrics = facade.measureAsList(String line);
  // save the metrics
}

Ok, I've updated my repo. Works fine! I'm more or less ready to do a full blown completeness measurement.

@mielvds

I've extended the API (CalculatorFacade) with some methods, so right now here is the full spectrum:

Methods accepting a single string (JSON, XML, or CSV)

  • String measure(String record) throws InvalidJsonException
  • List<String> measureAsList(String record) throws InvalidJsonException
  • List<Object> measureAsListOfObjects(String record) throws InvalidJsonException
  • Map<String, Object> measureAsMap(String record) throws InvalidJsonException

Methods accepting a list of strings (CSV)

  • String measureCsv(List<String> record) throws InvalidJsonException
  • List<String> measureCsvAsList(List<String> record) throws InvalidJsonException
  • List<Object> measureCsvAsListOfObjects(List<String> record) throws InvalidJsonException
  • Map<String, Object> measureCsvAsMap(List<String> record) throws InvalidJsonException

I choose List<String> as input and did not implemented String[], because it is quite easy to convert them on the client side. Do you agree? I'll change the exception type in a further step to something less JSON oriented.

Great work, thanks! Two minor questions:

  • Can't you simply overload the measureAsList method with measureAsList(List<String> record) or even measureAsList(String[] record)?
  • I find converting from String[] to List<String> easier than the other way around, but that's indeed not a huge issue.

Hi @mielvds,

thanks for the feedback! I removed the ...Csv... from the method name, so right now you can use the same method name with String and List<String>:

  • String measure(String record) throws InvalidJsonException
  • String measure(List<String> record) throws InvalidJsonException
  • List<String> measureAsList(String record) throws InvalidJsonException
  • List<String> measureAsList(List<String> record) throws InvalidJsonException
  • List<Object> measureAsListOfObjects(String record) throws InvalidJsonException
  • List<Object> measureAsListOfObjects(List<String> record) throws InvalidJsonException
  • Map<String, Object> measureAsMap(String record) throws InvalidJsonException
  • Map<String, Object> measureAsMap(List<String> record) throws InvalidJsonException

I do not get your second question, as it was rather a statement, right?

ha yes, sorry. The .asArray(new String[0]) is more ugly than new ArrayList(array), but that's java. I completely support your choice for List.

Method names look good! I'll adjust my code to the changes.