
Primary LanguageJavaMIT LicenseMIT

A parse chat kata


This kata need has following software dependencies, please install them:

  • Maven 3.6.x
  • Java 11.x

Please add Maven to your path for easy running for the following steps.


You can use any ide for simple compile, test and run. My suggestion is to run from Eclipse, download it from here: https://www.eclipse.org/downloads/packages/

When Eclipse is installed, clone this project from git / import from maven, see here for a tutorial https://javabydeveloper.com/import-maven-project-eclipse/

As Jar

This example can be packaged as jar and executed anywhere java is installed ( see prerequisite ). To create the package, run:

mvn compile package

in the target directory there will be two jar: to run the jar with all dependecies run:

java -jar parse-chat-kata-0.0.1-SNAPSHOT-jar-with-dependencies.jar InputFile

where InputFile is the file to parse



In order to compile, clone the project, and from the command line run:

mvn compile

Or let your ide do the job for you


From command line run

mvn compile test

In the output there will be tests outcome.

How to run

From the command line run:

mvn compile exec:java -Dexec.mainClass="it.albertotn.ParseChat" -Dexec.args="InputFile"

where InputFile is your file, for example:

mvn compile exec:java -Dexec.mainClass="it.albertotn.ParseChat" -Dexec.args="step1.txt"

TODO - Future work

  • result json is not a valid for rfc7159 ( see here https://datatracker.ietf.org/doc/html/rfc7159#section-7 ). Following json standard notation (aka attributes between double quotes ) open the possibility to use a standard library like jackson to build the json, instead of a custom json write
  • in step 4 I realize that some sentences has \n and some not. I think that normalize this behaviour ( for example always remove it ) can help for further processing
  • it's to consider if sentences always ends with a point or not, real people in chat does not use point in the end of the sentences
  • in step 6 to figure out agent/customer I assume there is a database of agent avaiable and hardcoded into the code. This because step3 highlight the possibility that the chat is not one message for agent/customer, but many messages for each of them, so is not safe to infer that second message is from an agent ( awalys I mean! )
  • in step 7 mention is without double quote. I think that normalize, be more strict on a common schema is useful for later processing