/delightful-parsing

A Scala library for parsing fixed-width columns in a string.

Primary LanguageScalaOtherNOASSERTION

Build Status Maven Central

delightful-parsing is a library for parsing fixed-width columns from a string. It is highly inspired by the project Apache Daffodil. The differences are:

  1. For now, a much smaller scope (ie fixed-width strings)
  2. Defining the parsing specification with Scala case classes and type annotations, instead of XSD

This library is built for Scala 2.12.15, 2.13.8 and 3.1.2

SBT

libraryDependencies += "org.sweet-delights" %% "delightful-parsing" % "0.9.0" // check latest version above

Maven

<dependency>
  <groupId>org.sweet-delights</groupId>
  <artifactId>delightful-parsing_2.12</artifactId>
  <version>0.9.0</version>
</dependency>

All files in delightful-parsing are under the GNU Lesser General Public License version 3. Please read files COPYING and COPYING.LESSER for details.

How to parse a string having fixed-width columns?

Step 1: decorate a case class with delightful-parsing annotations. Example:

import sweet.delights.parsing.annotations.{Length, LengthParam, Options, Regex, Repetition}

@Options(trim = true)
case class Foo(
  opt: Option[String] @Length(3),
  str: String         @Regex("""\w{3}"""),
  integer: String     @LengthParam("intSize"),
  more: List[Bar]     @Repetition(2)
)

@Options(trim = true)
case class Bar(
  list: List[String] @Repetition(2) @Length(5)
)

Step 2: parse!

import sweet.delights.parsing.Parser._

val line = "optstrintegerAAAAABBBBBCCCCCDDDDD"
val parsed = parse[Foo](Map("intSize" -> 7))(line)
println(parsed)
// Foo(
//   opt = Some("opt"),
//   str = "str",
//   integer = "integer",
//   List(
//     Bar(List("AAAAA", "BBBBB")),
//     Bar(List("CCCCC", "DDDDD"))
//   )
// )

Supported types

By default, Parser is able to parse strings and basic types such as Int, Double, String, Option[T], List[T] etc.

The support for additional types is done via implentations of Parser[T].

Definitions

Node

Considering a case class, any field that has a reference to another case classe is a node field.

A node type is the type of a node field.

Leaf

Any field that is NOT a node is a leaf field.

A leaf type is the type of a leaf field.

Types Boolean, Byte, Short, Int, Long, Float, Double and String are leaves.

Optional and repeatable types

A node or leaf type T can be optional (i.e. Option[T]) or repeatable (i.e. List[T]).

Annotations

Why type annotations?

The choice of type annotations (i.e. annotations "on the right") rather than variable annotations (i.e. annotations "on the left") is purely for readability purposes. As such, it is subjective and opiniated.

Case class annotations

@Options

Speficies some parsing options like trimming what is consumed. For now, this annotation is mandantory for nodes (case classes). Example:

import sweet.delights.parsing.annotations.Options

@Options(trim = true)
case class Foo()

Type annotations

@Conditional(Int => Boolean)

Experimental. TODO.

@Format(String) & @FormatParam(String)

Specifies a format to parse a certain leaf type. For now, leaf types supported are java.time.{LocalDate, LocalTime, LocalDateTime, ZonedDateTime}. Example:

import java.time.LocalDate
import sweet.delights.parsing.annotations.{Length, Options, Format}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  date: LocalDate @Length(6) @Format("yyMMdd")
)

Parser.parse[Foo]("200101")
// res0: Foo(
//   date = LocalDate.of(2020, 1, 1)
// )

The format can be provided through a parameter by using the @FormatParam(String) annotation.

import java.time.LocalDate
import sweet.delights.parsing.annotations.{Length, Options, FormatParam}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  date: LocalDate @Length(6) @FormatParam("dateFormat")
)

Parser.parse[Foo](Map("dateFormat" -> "yyMMdd"))("200101")
// res0: Foo(
//   date = LocalDate.of(2020, 1, 1)
// )

@Ignore & @IgnoreParam(String)

Specified whether the parsing of a field should be bypassed (ignored) or not. Applicable only to leaf types. Example:

import sweet.delights.parsing.annotations.{Ignore, Length, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  str: String @Length(5) @Ignore,
  opt: Option[String] @Length(2)
)

Parser.parse[Foo]("XX")
// res0: Foo(
//   str = "",
//   opt = Some("XX")
// )

The parsing of str is skipped completely. The field is assigned an empty string, its default value.

Ignoring a field can be set through a parameter by using the @IgnoreParam(String) annotation.

import sweet.delights.parsing.annotations.{IgnoreParam, Length, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  str: String @Length(5) @IgnoreParam("ignoreMe"),
  opt: Option[String] @Length(2)
)

Parser.parse[Foo](Map("ignoreMe" -> true))("XX")
// res0: Foo(
//   str = "",
//   opt = Some("XX")
// )

@Length(Int) & @LengthParam(String)

Specifies the number of characters to be consumed explicitly. Example:

import sweet.delights.parsing.annotations.{Length, Options}

@Options(trim = true)
case class Foo(
  str: String @Length(5),
  opt: Option[String] @Length(2)
)

The field str consumes 5 characters from the input string. As the trimming option is activated, the final length of str may be less than 5.

The field opt consumes 2 characters. In addition to the behavior above, as this is an optional field, if the trimmed string is empty, then opt becomes None.

The length can be provided through a parameter by using the @LengthParam annotation.

import sweet.delights.parsing.annotations.{LengthParam, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  str: String @LengthParam("myStrSize")
)

Parser.parse[Foo](Map("myStrSize" -> 5))("ABCDE")
// res0: Foo(
//   str = "ABCDE"
// )

@Lenient

Specifies to ignore any exceptions raised during the parsing of a leaf field. Example:

import sweet.delights.parsing.annotations.{Length, Lenient, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  integer: Int        @Length(5) @Lenient,
  option: Option[Int] @Length(5) @Lenient
)

Parser.parse[Foo](Map("myStrSize" -> 5))("xxxxxXXXXX")
// res0: Foo(
//   integer = 0,
//   option = None
// )

NB:

  • the default value of an integer is 0
  • the default value of an Option is None

@ParseFunc[T](String => Option[T])

Provides a user defined parsing function (UDPF) for a leaf type T. When present, it overrides default parsing functions or any parsing function derived from @Format or @FormatParam annotations. The UDPF must be statically defined. Example:

import java.time.LocalTime
import sweet.delights.parsing.annotations.{Length, ParseFunc, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  time: LocalTime   @Length(5) @ParseFunc[LocalTime](Foo.removePrefix)
)

object Foo {
  def removePrefix(s: String): Option[LocalTime] = Some(LocalTime.parse(s.substring(1)))
}

Parser.parse[Foo]("X03:45")
// res0: Foo(
//   time = LocalTime.of(3, 45)
// )

@Regex(String)

Specifies characters to be consumed thanks to a regular expression. Applicable of leaf types only. Example:

import sweet.delights.parsing.annotations.{Regex, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  str: String @Regex("""\w{5}""")
)

Parser.parse[Foo]("ABCDEF")
// res0: Foo(
//   str = "ABCDE"
// )

@Repetition(Int)

Specifies the number of repetitions for a list. Example:

import sweet.delights.parsing.annotations.{Length, Repetition, Options}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  strs: List[String] @Repetition(2) @Length(5),
  bars: List[Bar]    @Repetition(3)
)

@Options(trim = true)
case class Bar(
  str: String @Length(1)
)

Parser.parse[Foo]("ABCDEFGHIJKLM")
// res0: Foo(
//   strs = List("ABCDE", "FGHIJ"),
//   bars = List(
//     Bar(str = "K"),
//     Bar(str = "L"),
//     Bar(str = "M")
//   )
// )

strs is a repeatable leaf field. As such, is requires @Length in addition to @Repetition.

bars is a repeatable node field. Only @Repetition is required.

@TrailingSkip(Int)

Specifies a number of characters to be skipped after a field is parsed successfully. Example:

import sweet.delights.parsing.annotations.{Length, Options, TrailingSkip}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  str1: String @Length(1),
  str2: String @Length(1) @TrailingSkip(1),
  str3: String @Length(1)
)

Parser.parse[Foo]("AB_D")
// res0: Foo(
//   str1 = "A",
//   str2 = "B",
//   str3 = "D"
// )

@TrueIf

For a Boolean field, specifies a string that should be matched to evaluate the field to true. Example:

import sweet.delights.parsing.annotations.{Length, Options, TrueIf}
import sweet.delights.parsing.Parser

@Options(trim = true)
case class Foo(
  bool: Boolean @Length(3) @TrueIf("Yes") 
)

Parser.parse[Foo]("Yes")
// res0: Foo(
//   bool = true
// )

Parser.parse[Foo]("xxx")
// res1: Foo(
//   bool = false
// )

Limitations

  • case classes MUST be decorated with the Options annotation
  • all fields of a case class MUST be annotated with applicable annotations

Acknowledgments